Shared Flashcard Set

Details

Title

EPPP - Test Construction SM

Description

Study Materials

Total Cards

Subject

Psychology

Level

Post-Graduate

Created

06/13/2011

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Psychology Flashcards

Cards Return to Set Details

Term

Alternate Forms Realiability

Definition

Method for estimating a test's reliability that entails administering two forms of the test to the same group of examinees and correlating the two sets of scores. Forms can be administered at about the same time (coefficient of equivalence) or at different times (cofficient of equivalence and stability). Considered by some experts to be the best (more thorough) method for assessing reliability.

Term

Classical Test Theory

Definition

Theory of measurement that regards observed variability in test scores as reflecting two components: true differences between examinees on the attribute(s) measured by the test and the effects of measurement (random) error. Reliability is a measure of true score variability.

Term

Coefficient Alpha / KR-20

Definition

Method for assessing internal consistency reliability that provides an index of average inter-item consistency. Ruder-Richardson Formula (KR-20) can bes used as a substitude for coefficient alpha when test items are scored dichotomously.

Term

Construct Validity / Convergent and Discriminant

Definition

The extent to which a test measures the hypothetical trait (construct) it is intented to measure. Methods for establishing construct validity include correlating test scores with scores on measures that do or do not measure the same trait (convergent and discriminant validity); conducting a factor analysis to assess the test's factorial validity; determining if changes in test scores reflect expercted developmental changes; and seeing if experimental manipulations have the expected impact on test scores.

Term

Criterion Contamination

Definition

Refers to bias introduced into a person's criterion score as a result of the knowledge of the scorer about his/her performance on the predictor. Test to artificially inflate the relationship between the predictor and criterion.

Term

Criterion-Referenced Interpretation

Definition

Interpretation of a test score in terms of a prespecified standard; i.e., in terms of percent of content correct (percentage score) or of predicted performance on an external criterion (e.g., regression equation, before criterion scores).

Term

Criterion-Related Validity / Concurrent and Predictive

Definition

The type of validity that involves determining the relationship (correlation) between the predictor and the criterion. The correlation coefficient is referred to as teh criterion-related validity coefficient. Criterion-related validity can either be concurrent (predictor scores obtained at about the same time) or predictive (predictor scores obtained before criterion scores).

Term

Cross-validation and shrinkage

Definition

Process of re-assessing a test's criterion-related validity on a new sample to check the gernalizability of the original validity coefficient. Ordinarily, the validity coefficient "shrinks" (become smaller) on cross-validation because the chance factors operating in the original sample are not all present in the cross-validation sample.

Term

Factor Analysis

Definition

A multivariate statistical technique used to determine how many factors (constructs) are needed to account for the intercorrelations among a set of tests, subtests, or test items. Factor analysis can be used to assess a test's construct validity by idnicating the extent to which the test correlates with facors that it would and would not be expected to correlate with. From the perspective of factor analysis, true score variability consists of communality and specificity. Factors identified in a factor analysis can be either orthogonal or oblique.

Term

Factor Loadings and Communality

Definition

In factor analysis, a factor loading is the correlation between a test (or other variable included in the analysis) and a factor. The communality is the total amount of variability in scores on the test (or other variable) that is accounted for by the factor analysis (i.e., by the identified factors).

Term

Incremental Validity/True Positives, False Positives, True Negatives, False Negatives

Definition

The extent to which a predictor increases decision-making accuracy. Calculated by subtracting the base rate from the positive hit rate. Terms to have linked with incremental validity are predictor and criterion cutoff scores; True and false positives and true and false negatives. True positives are those who scored high on the predictor and criterion; false postitives scored high on the predictor but low on the criterion; true negatives scored low on the predictor and the criterion; and false negatives scored low on the predictor but high on the criterion.

Term

Item Characteristic Curve

Definition

When using item response theory, an item characteristic curve (ICC) is constructed for each item by plotting the proportion of examinees in the tryout sample who answered the item correctly against either the total test score, performance on an axternal criterion, or a mathematically-derived estimate of a latent ability or trait. The curve provides imformation on the relatioship between an examinee's level on the ability or trait measured by the test and the probability that he/she will respond to the item correctly.

Term

Item Difficulty

Definition

An item's difficulty level is calculated by dividing the number of individuals who answered the item correctly by the total number of individuals; ranges in value from O (very difficult item) to 1.0 (very easy item). In general, an item difficulty index of .50 is preferred because it maximizes differentiation between individuals with high and low ability and helps ensure a high reliability coefficient.

Term

Item Discrimination

Definition

Item discrimination refers to the extent to which a test item discriminates (differentiates) between examinees who obtain high versus low scores on the entire test or an external criterion. The item discrimination index (D) ranges from -1.0 to +1.0. If all examinees in the upper group and none in the lower group answered the item correctly, D is +1.0. If none of the examinees in the upper group and all examinees in the lower group answered the item correctly, D equals -1.0

Term

Kappa Statistic

Definition

A correlation coefficient used to assess inter-rater reliability

Term

Multitrait-Multimethod Matrix

Definition

A systematic way to organize the correlation coefficients obtained when assessing a measure's convergent and discriminant validity (which, in turm, provides evidence of construct validity). Requires measuring at least two different traits using at least two different methods for each trait. Terms to have linked with multitrait-multimethod matrix are monotrait-monomethod, monotrait-heteromethod, heterotrait-monomethod, and heterotrait-heteromethod coefficients.

Term

Norm-Referenced Interpretation

Definition

Interpretation of an examinee's test performance relative to the performance of examinees in a normative (standardization) sample. Percentile ranks and standard scores (e.g., z-scores and T scores) are types of norm-referenced scores.

Term

Orthogonal and Oblique Rotation

Definition

In factor analysis, an orthogonal rotation of the identified factors produces uncorrelated factors, while an oblique rotation produces correlated factors. Rotation is done to simplify the interpretation of the identified factors.

Term

Relationship between Reliability and Validity

Definition

Reliability is a necessary but not sufficient condition for validity. In terms of criterion-related validity, the validity coefficient can be no greater than the square root of the product of the reliabilities of the predictor and criterion.

Term

Relevance

Definition

In test construction, relevance refers to the extent to which test items contribute to achieving the stated goals of testing.

Term

Reliability / Reliability Coefficient

Definition

Reliability refers to the consistency of test scores; i.e., the extent to which a test measure an attribute without being affected by random fluctuations (measurement error) that produce inconsistencies over time, across items, or over different forms. Methods for establishing reliability include test-retest, alternative forms, split-half, coefficient alpha, and inter-rater. Most produce a reliability coefficient, which is interpreted diretly as a meausre of true score variability - e.g., a reliability of .80 indicates that 80% of teh variability in test scores is true score variability.

Term

Split-Half Reliability / Spearman-Brown Formula

Definition

Split-half reliability is a method for assessing internal consistency reliability and involves "splitting" the test in halp (e.g., odd- versus even-numbered items) and correlating examinees' scores on the two halves of the test. The split-half coefficient is usually correlated with the Spearman-Brown formula, which estimates what the test's reliability would be if it were based on the full length of the test.

Term

Standard Error of Estimate / Confidence Interval

Definition

An intex of error when predicting criterion scores from predictor scores. Used to construct a confidence interval around an examinee's predicted criterion score. Its magnitude depends on two factors: the criterion's standard deviation and the predictor's validity coefficient.

Term

Standard Error of Measurement / Confidence Interval

Definition

An index of measurement error. Used to construct a confidence interval around an examinee's obtained test score. Its magnitude depends on two factors: the test's standard deviation and reliability coefficient.

Term

Test Length / Range of Scores

Definition

A test's reliability can be increased in several ways. One way is to increase the test lenght by adding items of similar content and quality. Another is to increase the heterogeneity of the sample in terms of the attribute(s) measured by the test, which will increase the range of scores.

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Psychology Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile