participants divided by random 


trait or characteristic with two or more categories 


stimulus or input. "CAUSE" Researchers usually physically manipulate the independent variable; physically administer treatment. (In nonexperiemental studies, researchers do not physically manipulate the IV. They observe how the cured naturally) 


outcome ("EFFECT"). What you measure 


Degree to which measures actually measure what they intend to measure. Looks at the test 


if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension. 


ability to generalize beyond sample and conditions that yielded the findings, to the population. Directly tied to sampling. 


to what extent does the test predict the outcome it is suppose to predict? 


Term
Correlation (validity) coefficient 

0.00 (no relationship) to 1.00 (perfect validity) 


onstruct validity refers to whether a scale measures or correlates with the theorized psychological scientific construct (e.g., "fluid intelligence") that it purports to measure. 


Reliable and not valid but no valid and unreliable. Reliability is for scores & the validity is for tests. 


Term
Interobserver reliability coefficients 

agreement between observers. r 2 different raters' scores 


measure at two points in time. Exact same form. Wants to show difference. What is scored is reliable. Make sure they are answering the same the whole time. Do not what a difference in scores. 


Term
parallelforms reliability 

Same content, different form. 2 parallel forms, interchangeable, different items that cover the same content. 


use scores from a single administration of a test to examine the consistency of test scores. Items consistent with each other. Person who answers certain was in any item is likely to answer in same way on other items in the same scale 


Score the test as though they consisted of two separate tests (oddeven spilt). Interval 


Items are consistent with each other; Tests for internal consistency. single administration of a test; math is done to obtain the equivalent of the average of all possible spithalf reliability coefficients. Not attributed over time. Most common. Shoot for hight score. 


tests designed to facilitate a comparison of an individual's performance with that of a norm group (percentile rank). Meant to be of medium difficulty. 


Term
Criterionreferenced tests 

measures the extent to which individual examinees have met performance standards. Difficulty not a concern. 


Term
Pretestposttest randomized control group design 

if the experimental group differs than it is attributed to either the treatment or error. WANT A DIFFERENCE 


degree to which results of study can be attributed to treatments or other independent variables. 


Term
Threats to internal validity 

1. history: environmental influences (ex. events that occur during research) *2. maturation: subjects matured *3. testing effects: things learned from pretest influenced later behavior 4. ceiling effects: many scores are near max possible. May not detect true differences 5. Floor effects: many scores near minimum possible. May not detect true differences 6. instrumentation: changes in measurement. Groups take tests under different conditions 


2 groups, not randomly selected 


attention effect. when the subjects know they are being studied 


control group may try to outperform the experimental group 


person dispensing drug doesn't know either 


participants know what experimenters are looking for 


does not mean meaningful significance. 


examine relationships between groups 


interval data, distributed normally in population 


summarize data so that it can be easily comprehended 


displays how scores are distributed. 


help researchers draw inferences about the effects of sampling errors on the results that are described with descriptive statistics. Helps researchers make generalizations about the characteristics of the populations on the basis of data obtained by studying samples. Used to infer from our sample to the population. Generalize. 3 Types: ChiQuare, ttest, ANOVA 


help readers interpret results in light of sampling error. 


likelihood that finding sample exists in population. Any difference is due to real influence, rather than chance or sampling error. 


helps researchers decide whether the differences in descriptive statistics statistics they identify are reliable. Determine the probability that the null hypothesis is true. 


helps distinguish between values obtained from sample and values obtained from a census 


test that allows you to determine test of null hypothesis from differences between frequencies. categorical data analyzed. can be used for treatments also (posttest looks just like pretest) 


total minus 2. probability that the null hypothesis is correct. Subset step for obtaining value of p. 


Term
When null hypothesis not rejected p > .05 

statistically insignificant. 


Term
When the probability that the null hypothesis is correct is or less than .05 

reject the null hypothesis. Statistically significant. Less than 5% chance due to sampling error. 


each participant is classified in terms of two variables in order to examine the relationship between them. 


when a null hypothesis is rejected and it is in fact a correct hypothesis. 


when researchers fail to reject the null hypothesis when it is incorrect 


variability (shows how much variation; S or SD (population) sd (sample), "on average the scores varied from the mean___" How much the score varies from the mean. 


Term
Pearson Correlation Coefficient (Pearson r) : 

relationship between 2 quantitative sets of scores. 


(positive relationship): high in both areas 


Term
Inverse relationship (negative relationship): 

high in one variable and low in the other. 


set of ratings; standard by which the test is being judged 


to what extent does the test predict the outcome it is suppose to predict? 


Term
Concurrent Validity Coefficient: 

obtained by administering the test and collecting the criterion data at about the same time. Happening right now. 


relies on subjective judgements and empirical data. Hypothesize a relationship between the test scores and scores on another variable. Examples: score on a depression scale and success in college. 


collection of related behaviors that are associated in a meaningful way (ex. depression) 


how variables are related to one another. Relationship between 1. positive/ direct: both high or both low (same direction) 2. negative/inverse relationship: opposite directions 3. allows us to predict either variable from knowledge of the other. Always exceptions bc never have perfect relationship 4. nonexperimental: no IV and DV 5. Cannot be used to determine causation. 


interval data (parametric). Difference between group mean scores. The t test: used to test the null hypothesis regarding the observed difference between two means. 


Term
One Way Analysis of Variance (ANOVA) 

: F value similar to ttest Indicates if the null hypothesis is correct ANOVA: can compare many means (t test can only compare 2) or if sample sizes are large and unequal ANOVA: indicates whether a set of differences is significant overall. Separating variance due to" * within group (chance) * between groups (treatment) 


one treatment, many groups. participants classified in only one way. One factor being explored and 3 or more groups within this factor. Rather than run multiple ttests to determine if one is statistically significant (but will not tell you which one) 


twoway classification, also know as the main effect. more than one treatment factor being explored. Explores interactions among variables. 3 x 2 factorial design. 


effect of an independent variable on a dependent variable averaging across the levels of any other independent variables 


do they effect on another? dependent? interaction between variables. When graphed: do lines cross? 


Term
Statistical significance: 

whether a difference is reliable in light of random errors. 


collection of related behaviors that are associated in a meaningful way 


 ensure that subgroups w/in population represented proportionally in sample  way to decrease sampling error b/c sample more representative of pop.  can also use to make sure subgroups represented equally 


1. Cannot i.d. every member of the population 2. Convenience samples 3. Volunteerism 


Term
Significance and Same size 

Significance becomes more likely as sample size increases. As a general rule, the larger the random sample, the smaller the sampling error, or, the more precise the results are  Precision = extent to which same results would be obtained if another random sample were drawn from same population  Increasing sample size also produces diminishing returns 


Degree to which measures produce consistent results. to create measures that consistently show difference b/t individuals who really are different, and show same scores for individuals who are the same. Reliability of Scores, Not Tests. 


theoretical construct referring to a person’s score containing no error  actual amount of whatever being measured (ability, selfesteem, knowledge, etc) 


– difference between person’s true score & score actually obtained 


Term
Factors that might cause measurement error: 

1. Test’s items only sample of total possible items might be used to measure construct 2. Test administrators 3. Test scorers 4. Testing conditions 5. Variability in how individuals feel 


Term
CriterionReferenced Tests 

test items relate to instructional objective  criteria for “success” determined ahead of time  no distribution of scores is done 


not meant to show attainment of specific learning objectives  how students/schools/etc compare with each other  individual score translated to converted score to ￼￼determine “relative standing” ￼￼￼￼￼￼￼ intended to disperse scores across normal curve 


Study exploring relationship between teachers’ culturally held beliefs and student achievement. 


choose an extreme group on any 1 measure, will tend to be less extreme on another measure, even if 2 measures highly correlated 


when occur at random and equally among the groups, not a problem 


Term
This would be used to figure out the mean semester gpa differences between involved and uninvolved students 

Term
Painters and dancers scored higher than acountants on a researcher’s creativity test. What type of validity does this demonstrate 

Term
A study investigated the impact of leadership development program in students first year with the subsequent leadership behaviors of the contents senior year program. What are the IV and DV. 

IV = program DP = behaviors 


Term
A study is conducted to measure the extent to which alcohol use drug use and violence affects grades for high school students. What test should be used A study is conducted to measure the extent to which alcohol use drug use and violence affects grades for high school students. What test should be used 

Term
A researcher wants to study the relationship between act scores and GPA 

Term
What sampling method would be used if a teacher uses students in their class 

Term
Professor X conducts a study over a 2 year period. During this time 20 of the original 75 drop out 

Term
A professor gives a test on US history, however most of the questions are direct on german history. What type of measurement validty is being threatened 

Term
A researcher tested the relationship between emotional intellenges and empathy. The correlation between these 2 construct was .72 . what do these results mean = 

positive correlation WHAT DOES THIS MEAN 


Term
A researcher wanted to know the communities feelings about the library hours. They sat in front of the library and asked volunteers about the survey. What type of sampling = 

Term
A researcher is comparing the average nuber of hours a student studies every year of school. The researcher gets a tvalue of 4.30 w/ 2df. The researcher decides to reject the null hypothesis @ the .05 level. Upon further researcher there was no significance. What type of error was this = 

Term
Dr. Tammy admisinsters a test regarding leadership behaviors and collects results using criterion data. Which type of validity is this? 

Term
If a researcher makes judgements about the appropriateness of the contents in a measure, you are checking for 

Term
Relies on subjective judgements and empirical data 

Term
Threats to External Validity 

1. Nonrepresentativeness  is sample representative of larger population? 2. Artificiality  findings of small, brief, or contrived study not apply to realistic setting 3 Types:  nontypical task  nontypical instruction  control group 


describes things as is or once were  uses descriptive stats 


(experimental research design)  compare 2 + groups  done to determine existence & nature of difference  uses inferential stats 


Term

– Involves interval data, assumed to be distributed normally in population 


 involves data not assumed to be normally distributed in population  frequently used when data can be placed in to categories 


Term
Compute the test statistic value: 

value you would expect the test statistic to yield if the null hypothesis is indeed true 


 used to infer something about the population based on the sample 19s characteristics  how much confidence can have when generalizing from a sample to a population 


Purposive, smaller sample size (not random), should include demographic info for decisions of transferability 


interviews (focus groups), observations, text rich data (journals, openended questionnaires) 


Term
Trustworthiness includes... 

Truth value/ credibility; Transferability; Consistency/ dependability; Confirmability 


Term
Methods to reach trustworthiness 

1. Prolonged engagement 2. persistent observation 3. triangulation 4. peer debriefing 5. negative case analysis 5. referential adequacy 6. member checking 


Researcher aware of multiple influences & contextual factors that influence phenomenon. Potential Danger: 1Cgoing native 1D 


I.d. characteristics most relevant to issue being pursued & focus on them. Potential Danger: premature closure 13 come to focus too soon 


Way to support findings by showing that independent measures of it agree (corroborate). 3 Types: Data source (persons, times, places) Method (observation, interview, documents) Researchers (investigator A, B, C) 


Process of exposing self to peer to explore aspects of inquiry that may otherwise remain only implicitly w/in R 19s mind 


Process of revising hypotheses. Often linked to persistent observation. Requires R to look for disconfirming data in both past & future data 


Recorded material provides 1Cbenchmark 1D against which later data analysis & interpretation can be tested for adequacy. 


Process of sharing data/findings with participants. Purposes: Opportunity for participants to correct errors &/or challenge interpretations Opportunity to volunteer additional information Puts participants on record as agreeing w/ R 19s interpretation 

