Selecting items involces considering each item's relevance, difficulty level and ability to discriminate between examinees with different levesl of the characteristic being studied. 


Does the item assess the content of the behavioral domain the test is designed to evaluate? 


Does it reflect the appropriate cognitive or ability level? 


Does it require knowledge, skill or abiliities outside of the domain of interest? 


 The extent to which a test item distinguishes between high and low scorers on the whole test.  Ranges 1.0  +1.0 1.0 = if all examiniess in the upper group and none in the lower group answer item correctly 1.0 = if opposite D = .35 or higher is considered acceptable  Moderate difficulty has greatest potential for discrimination. 


Item characteristics derived from IRT are the same across samples  possible to equate scores from different sets of items and from different tests because test scores are reported in terms of level on the trait measured  Like GPA rather than individual scores in different classes (90% in math different than 90% in English) 


Term
Item Characteristic Curve 

Definition
 Difficulty level of an item is indicated by the ability level where 50% of examinees obtained a correct response.  Item's ability to discriminate between high and low achievers is indicated by the slope of the curve. The steeper the slope, the greater the discrimination. 


How much the scores reflect the truth and how much it reflects error  Estimate of the proportion of variability in examinees obtained scores that is due to true differences among examinees on the attributes measuredd by the test.  CONSISTENCY = RELIABILITY 


A correlation coefficitent .0+1.0 Correlating test with itself ex. .84 indicates that 84% of variability in scores id due to true score differences among examinees while 16% is due to measurement error. 


Term
Internal Consistency Reliability  SplitHalf 

Definition
Test is split in 2. Scores on 2 halves are correlated. Problem: reliability is based on half of the length of the test and reliability decreases as length of the test decreases  so usually underestimates true reliability. 


Term
SpearmanBrown Prophecy Formula 

Definition
Corrects the splithalf reliability  estimates what the reliability coefficient would have been if based on full length. 


Term
Cronbach's Coefficient Alpha 

Definition
Administer test 1 time to a single group.  Formula to determine average degree of interitem consistency (average obtained from all possible splits of the test)  When test items are scored dichotomously (right/wrong) use KUDERRICHARDSOM (KR20) 


Calculate correlation coefficient with KAPPA statistic  Can determing % of agreement between 2 raters Error sources: lack of motivation of rater, rater biases, measuring device. 


Term
Factors that Affect the Reliability Coefficient 

Definition
1. Test Length  the longer the test, the larger the test's reliability coefficient 2. Range of Test scores  Reliability coefficient is maximized when range of scores is unrestricted. 3. Guessing  As probability of guessing right increases, the reliability coefficient decreases. 


Term
Standard Error of Measurement 

Definition
An index of amount of error is expected in obtained score for individual due to unreliability of the test * If raw score was converted to percentile rank, confidence interval = percentile band Ex. Polls  45% +/ 3% (3% standard error of measurement 


Refers to test accuracy A test is VALID when it measures what it is intended to measure 


Associated with achievement tests that measure knowledge of 1 or more content domain Involves clear identification of the content and then writing or selecting items that represent it. Establishment relies on judgement of subject matter experts. 


Measures the hypothetical trait it is intended to measure * Do all items measure the same construct? * Does it accurately distinguish between people who have different levels of construct? * Do test scores change with manipulation in a direction predicted by theory? 


Term
Multitraitmultimethod matrix 

Definition
Used to systematically organize data collected when assessing a test's convergent and discriminent valididty 


Term
Monotraitmonomethod coefficients 

Definition
* Same traitsame method Reliability coefficients indicating the coreelation between a measure and itself. 


Term
Monotraitheteromethod coefficients 

Definition
* same traitdifferent methods Correlations between different measures of the same trait  when large = convergent validity 


Term
Heterotraitmonomethod coefficients 

Definition
* different traitssame method Correlations between different traits within the measure  When small, indicates that the test has discriminant validity 


* Different traitsdifferent methods Correlations between different measures of different traits  when small indicates discriminant validity 


Term
5 Steps of Factor Analysis 

Definition
1. Administer several tests to a group 2. Correlate scores on each test with scores on every other test to obtain a correlation matrix 3. Using 1 of several techniques, convert the correlation matrix to a factor matrix. 4. Simplify the interpretation of the factors by rotating them 5. Interpret and name the factors in the rotated factor matrix. 


Indicates common variance or amount of variability in test scores due to factors the test shares in common or total amount of variability in test scores that is explained by the identified factors. 


1. Orthogonal  resulting factors are uncorrelated and independent 2. Oblique  resulting factors are correlated and not independent. 


Term
CriterionRelated Validity 

Definition
* Is of interest when scores are to be used to conclude/predict how examinee will likely stand on another measure. Assessed by correlating the scores of a sample of individuals on the predictor with their scores on the criterion. 


Criterion data are collected prior to or at about the same time as data on the predictor to estimate CURRENT status 


Predicitive validity is collected some time later to predict future performance on the criterion. 


Items from original item pool included in the final version of the predictor test are those that are correlated the most with the criterion. 


Term
NormReferenced Interpretation 

Definition
Involves comparing score to scores obtained by people included in a normative sample. The raw score is converted to another score that indicates the examinee's relative standing in the norm group. * Standard scores and Percentile ranks 


* Expresses an examinee's raw score in terms of % of examinees in the norm sample who achieved lower scores. * Does not provide info about absolute differencces, only order. Can say one is more than the other but not by how much 


* When raw test scores is converted to a standard score, the transformeds core indicates the examinee's position in the normative sample in terms of standard deviations from the mean. 


Term
Properties of a Zscore distribution 

Definition
1. The mean of the zscore distribution is equal to 0 2. The standard deviation is equal to 1. 3. All raw scores below the mean are negative scores, above mean are positive 4. Unless it is "normalized" the zscore distribution has the same shape as the raw score distribution. 

