Term
| What does X=T+E stand for? |
|
Definition
score(X)=true score(T)+error(E) FOR INDIVIDUALS |
|
|
Term
|
Definition
Does the item "appear" to be measuring the construct? "face value" Is it obvious that the test is measuring what it is supposed to be measuring? Can be manipulated |
|
|
Term
|
Definition
Is there a proportionate sampling of the universe of possible items? Variety ALL aspects taken into account |
|
|
Term
| Criterion-related validity |
|
Definition
Is there a correlation between the test (x) and some criterion (y)? -predictive (future) vs concurrent (current) must have direct measure (rating by someone else) (not self report-indirect measure) |
|
|
Term
|
Definition
Do the patterns of correlations with other measures make theoretical sense? "theoretical validity" -congruent, convergent, discriminate |
|
|
Term
| Under construct validity, what is congruent validity? |
|
Definition
correlations with other measures of the same construct -high correlation |
|
|
Term
| Under construct validity, what is convergent validity? |
|
Definition
correlations with measures of similar constructs -intermediate correlation |
|
|
Term
| Under construct validity, what is discriminant validity? |
|
Definition
correlations with measures of unrelated validity -low correlation |
|
|
Term
| Given y=5x+12 and SEM=3, what is the 95% confidence interval when x=2? |
|
Definition
Interpretation/Translation= 22±6 OR 16 ≤y ≤ 28
**Work Shown y=5(2*for 95%*)+12, so y=22, variance is %# times SEM so 2(3)=6, so 22±6** |
|
|
Term
| What are the sources of error? |
|
Definition
-true change -item sampling -statistical factors -random factors |
|
|
Term
| Which source of error most affects test/retest and split half reliability assessments? |
|
Definition
test/retest - true change split half - statistical factors |
|
|
Term
| What is the goal of test/retest? |
|
Definition
| stability with an acceptable reliability of r≥.80 and staying stable over time |
|
|
Term
| What are the types of reliability methods with the goal of internal consistency? |
|
Definition
-alternate forms (items measuring the same construct) -split half (correlating 1/2 test with other 1/2 like odd and even with Spearman Brown prophecy formula to correct) -Cronbach's Alpa (way of estimating avg. correlation among all possible pairs of items like Likert Scale) -KR20 & KR21 (same as Cronbach's but for dichotomus items) |
|
|
Term
| What is the Spearman-Brown prophect formula? |
|
Definition
| to correct estimate of reliability (restricted range) |
|
|
Term
| What are the confidence intervals of SEM? |
|
Definition
68% confidence = score ± 1 SEM 95% confidence = score ± 2 SEM |
|
|
Term
| What is the purpose of item analysis in general? |
|
Definition
| to detect and remove items that fail to discriminate to enhance reliability and validity of the test |
|
|
Term
| Explain item analysis for maximal performance measures. |
|
Definition
| Defines "top group" and "bottom group" and computes difficulty index (p) and Discrimination index (D) |
|
|
Term
| What is the difficulty index? |
|
Definition
indicated by "p" equation: (P top + p bottom)/2 Optimum p = .50, higher = easy, lower = difficult |
|
|
Term
| What is the discrimination index? |
|
Definition
indicated by "D" Equation: P top - P bottom Acceptable discrimination is D≥.30, unacceptable is D˂.30 |
|
|
Term
| What is intrinsic ambiguity? |
|
Definition
bad kind of ambiguity unacceptable discrimination index top group does not have knowledge |
|
|
Term
| What is extrinsic ambiguity? |
|
Definition
good kind of ambiguity acceptable discrimination index bottom group does not have knowledge |
|
|
Term
Item Analysis Maximal Performance Sample: Group #correct #incorrect p D T 7 3 B 4 6 Give p, give D, give 3 part interpretation |
|
Definition
p=.55 (more than 1/2 got it right) D=.30 Interpretation: slightly easy (b/c p above .50), acceptable discrimination (b/c D is .30) and extrinsic ambiguity (b/c D is acceptable) |
|
|
Term
| Briefly explain item analysis of typical performance measures. |
|
Definition
| Since there are no right and wrong answers, item difficutly index of p is not used (difficulty level is irrelevant) |
|
|
Term
| What are the 2 measures of discrimination in typical performance item analysis? |
|
Definition
--D = Discrimination index (% said yes top group a minus % said yes bottom group b), if D≥.30 then acceptable --Rpb = item total correlation (correlation between item score and total score, closer to 1 is better correlation) |
|
|
Term
| What does item-total correlation rely on? |
|
Definition
significance as criterion, taking sample size into account which D does not. item total is more sensitive measure of discrimination |
|
|
Term
| What type of validity is emphasized by achievement tests? |
|
Definition
|
|
Term
| What type of validity is emphasized by aptitude tests? |
|
Definition
| criterion-related and construct validities |
|
|
Term
| Explain 4 of the subtests of the Stanford-Binet. |
|
Definition
-verbal reasoning, crystallized -quantitative reasoning, crystallized -abstract/visual reasoning, fluid -short term memory, memory specific |
|
|
Term
| How is content validity addressed on achievement tests in schools? |
|
Definition
| a matter of finding the best fit between the school/district curriculum and the emphasis of a given test |
|
|
Term
| How is basal age computed in the Stanford-Binet? |
|
Definition
| highest level at which all items are passed |
|
|
Term
| How is ceiling age computed on the Stanford-Binet? |
|
Definition
| level at which all answered incorrectly |
|
|
Term
| How is the Stanford-Binet scored? |
|
Definition
| basal age + months of credit for items gotten right |
|
|
Term
What is the current deviation IQ equation? Why is deviation IQ preferred to the old ratio IQ? |
|
Definition
z(16)+100 Deviation IQ means the same across all age levels |
|
|
Term
| What is g-factor & what is s-factor? |
|
Definition
G-Factor: general factor, core of intelligence S-Factor: specific factors, unqie understanding |
|
|
Term
| What are the verbal subtests of the WAIS-R (Weschler Adult Intelligence Scale)? |
|
Definition
1 information 2 comprehension 3 arithmetic 4 similarities 5 vocabulary 6 digit span 7 letter-number sequencing |
|
|
Term
| What are the performance subtests for the WAIS-R? |
|
Definition
8 picture completion 9 block design 10 picture arrangement 11 object assembly 12 digit symbol 13 matrix reasoning 14 symbol search |
|
|
Term
| Explain Catell's fluid and crystallized IQ concept. |
|
Definition
Fluid IQ is innate and is shown and tested through matrices tests
Crystallized IQ is gained with experience with language and is shown and tested through vocabulary and verbal testing |
|
|
Term
| What is the most common IQ test? |
|
Definition
| WAIS-R (Weschler Adult Intelligence Scale) |
|
|
Term
| What does the public law IDEA (1990) provide for? |
|
Definition
For special needs students 1 least restrictive environment 2 individual education plans (IEPs) 3 clearly defined disorders |
|
|
Term
| What does the public law PL95-561 provide for? |
|
Definition
for gifted and talented students -incentives to school offering gifted programs (gifted is 1 1/2 SD above avg) |
|
|
Term
|
Definition
|
|
Term
| What is mental retardation? |
|
Definition
| IQ at least 2 SD below mean, some states also require assessment of psychosocial maturity |
|
|
Term
| What are learning disorders? |
|
Definition
| LD - not any other disorder, discrepency of 1 1/2 SD in aptitude vs achievement |
|
|
Term
| What are behavioral disorders? |
|
Definition
presence or absence of key behaviors requires behavioral assessment Example is ADD, ADHD |
|
|
Term
| What are the 3 issues of test bias? |
|
Definition
measurement bias prediction bias contect bias |
|
|
Term
| What is measurement bias? |
|
Definition
differential difficulty across groups example: item difficulty by group |
|
|
Term
|
Definition
differential prediction of scholastic success across groups predicting better for one group than another example: IQ predicitng school GPA |
|
|
Term
|
Definition
content of items biased towards the dominant group example: regional, cultural |
|
|
Term
| What are the 3 approaches to cross-cultural testing? |
|
Definition
-mainstream approach -pluralistic approach -"culture-fair" tests |
|
|
Term
| What is the mainstream approach to cross-cultural testing? |
|
Definition
one set of norms that are not differentiated by a group (race/culture) Tests should reflect dominant culture since schools are embedded in the dominant culture |
|
|
Term
| What is the pluralistic approach to cross-cultural testing? |
|
Definition
compare "like with like" Same test with different norms for different groups |
|
|
Term
| What is the "Culture Fair" test approach to cross-cultural testing? |
|
Definition
use of non-verbal items change assessments by removing all words so that anyone can take the test |
|
|
Term
| What are the problems with the "culture fair" test approach to cross-cultural testing? |
|
Definition
-assumptions of universal understanding (there is no such thing) -restricted range of abilities tested -lack of comparability with mainstream IQ tests |
|
|
Term
| What are the problems with the "culture fair" test approach to cross-cultural testing? |
|
Definition
-assumptions of universal understanding (there is no such thing) -restricted range of abilities tested -lack of comparability with mainstream IQ tests |
|
|
Term
| If you were constructing a depression scale, how would you deal with face validity and content validity? |
|
Definition
face validity: manipulate face validity so that it was not apparent that the depression scale was measuring depression so that the answers are not skewed one way or the other by the test taker
content validity: make sure that there are questions to check for all possible parts, types and situations of depression. |
|
|
Term
| If you were constructing a depression scale, how would you deal with criterion-related validity and construct validity? |
|
Definition
criterion-related validity: compare and look for correlation to scores on such tests as social desirability scale, body attitude, and the stress quiz, see if another test predicts depression in client, see if depression scale matches counselor observations
construct validity: make sure the test correlates well with other depression scales like Beck's depression scale. Make sure it correlates low with unrelated scales like the masculine/feminine scale |
|
|