Shared Flashcard Set

Details

Title

Educational Measurement

Description

Validity, Reliability, and Interpretation of test scores

Total Cards

Subject

Education

Level

Undergraduate 1

Created

06/27/2009

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Education Flashcards

Cards Return to Set Details

Term

assessment:
definition and purpose

Definition

any of a variety of ways to look at performance.
how well does the individual perform?

Term

test:
definition and purpose

Definition

instrument OR systematic procedure with uniform questions to sample behavior
how well does the individual perform?
can have NRT or CRT framework

Term

latent variable

Definition

a line that represents a construct
ex: knowledge of chemical properties of bases

Term

content standards v. performance standards

Definition

"what do students need to know?" v.
"how good is good enough?" (judgement! - cut score)

Term

4 parts of assessment procedure

Definition

Establish the: nature (max/typical), form (MC/construc/perf), use (plc/form/sum/diag), and method of interpreting (CRT v. NRT) the assessment

Term

natures of assessment

Definition

this is one part of the assessment procedure. Are you looking for "maximum/can do" or "typical/will do" performance? implied assessment type: achievement test vs. surveys/obervation

Term

illustrative assessments for measuring "max performance" vs. "typical performance"

Definition

max -> achievement or aptitude test
typical -> attitude surveys, observations
these categories are examples of deciding the "nature" of assessment

Term

forms of assessment

Definition

one part of the assessment procedure. Forms include:
MC, constructed response, performance task

Term

uses of assessment

Definition

one part of the assessment procedure. Uses include:
placement, formative, diagnostic, summative

Term

compare assessment types: placement, formative, diagnostic, summative (see p. 41 table 2.1)

Definition

placement and summative are higher stakes. formative is FYI, correction & reinforcement
diagnostic determines causes of struggle
placement can be just for goals/modality

Term

questions we are asking with assessment

Definition

what do students know?
what are they able to do?
what do they need to do next?

Term

methods of assessment (hint)

Definition

hint: methods of interpreting
CRT vs. NRT

Term

CRT

Definition

Criterion referenced test - no details yet

Term

NRT

Definition

norm-referenced test. No details yet

Term

central policy issues

Definition

DIE!

Term

How do test scores become meaningful?

Definition

this is an essential question. answer should address all aspects of validity (how many are there?) and reliability (specify variety of types that might be of interest

Term

How can we use tests to improve education and society?

Definition

another essential question from lecture 2.
answer should have lots of hedged recommendations.

Term

validity:
definition & types

Definition

the degree to which an assessment instrument or procedure can be used/interpreted for a specific purpose (context dependent).
assessment should: (1) cover content it purports to test, (2) correlate with specified, appropriate criteria (3) generate results consistent with implications of stated constructs (difficulty of items; bloom's taxonomy), (4) have consequences that are fair and appropriate.
validity determinations are largely a matter of judgment.

Term

reliability (table 5.1):
definition, types, & methods

Definition

the degree of consistency of the outcomes of an assessment. 5 diff. measures of reliability, + method(s) for each. One might measure (1) stability across time [using test-retest], (2) equivalence [using equivalent forms], (3) BOTH, (4) internal consistency [using 3 ways methods], or (5) conistency of ratings [using interrater methods]

reliability is reported as statistical coefficients (0-1)

Term

validity v. reliability

Definition

analogous to accuracy (i got what I wanted) vs. precision (I got the same result consistently).
with tests, reliability is necessary but not sufficient for validity.
VALID is specific to a particular stated purpose. RELIABLE is specific to a particular "sample" of takers (aka, context, group)

Term

content-related validity

Definition

the degree to which an assessment instrument or procedure covers content it purports to test. 4 steps to establishing: (1) objectives?, (2) know-do blueprint (bloom), (3) make test, (4) judge alignment

Term

procedure for attaining content validity

Definition

(1) identify objectives/goals, (2) build table of specs (KNOW content , DO bloom), (3) construct test, (4) panel to evaluate alignment

Term

criterion-related validity

Definition

Measure of scores' correlation to an “appropriate” criterion, which may be Concurrent (eg current GPA) OR predictive (eg future GPA). Although this aspect of validity involves correlation coefficient, judgment is still required to decide what degree of correlation is good enough.

Term

construct-related validity

Definition

the degree to which an assessment generates results that are consistent with the implications of stated constructs (difficulty of items; bloom's taxonomy); when you PROPOSE that an item fits a specific construct (eg, this is a comprehension question - it’s easy) then that construct implies the sorts of scores you should get (HIGH). If evidence (scores) fit that prediction, then your proposed construct interpretation is valid.

Term

consequential validity

Definition

the degree to which an assessment instrument or procedure (including interpretation) has consequences that are fair and appropriate.

Term

test-retest

Definition

measure of stability of test scores over time (one type of reliability)

Term

equivalent forms

Definition

measure of stability of test scores from different versions of test (one type of reliability)

Term

split-half

Definition

measure of stability of test scores from halves of items within a single test. Requires use of spearman-brown formula

Term

KR20

Definition

KR20 and KR21 and Cronbach's alpha coefficient are calculations that measure the internal consistency of a single test, which is one measure of reliability

Term

interrater methods

Definition

ways to measure the stability scores of the same test given by different raters. This is one type of reliability.

Term

ways that tests may be consistent or not, aka, sources of variation (table 5.4)

Definition

1. testing procedure (use any method except interrater)
2. student "characteristics"/response (use interval)
3. sample items (use eq forms or internal consist)
4. judgmental scores (use interraters)

Term

consistency in testing procedure

Definition

part of reliability; inconsistency will be detected by all methods of reliability estimation EXCEPT interrater

Term

consistency in student characteristics (how kids respond to test)

Definition

part of reliability; inconsistency will be detected by any time interval method, and to a less useful extent by test-retest

Term

consistency over diff. samples of items

Definition

part of reliability; inconsistency will be detected by test-retest OR internal consistency methods

Term

internal consistency

Definition

one aspect of reliability, it can be measured by split-half method (which requires spearman-brown formula), KR20 and KR21 and Cronbach's alpha coefficient (remember "generalizability theory" too?)

Term

SEM

Definition

Standard Error of measurement (need to know formula?). To get range of likely values for a student's "true" score, add a "confidence band" of +/- 1 SEM around every score for a category/domain.

SEM is determined by a student's standard deviation AND the test's reliability coefficient (table 5.6)

Term

table of specifications

Definition

2nd step of process for establishing content validity. Table can separate out both both content and construct (bloom's taxonomy) . In 4th step, items from test are placed into "cells" in the spec. table to see if they are distributed in the way you intended

Term

reliability coefficient

Definition

A correlation coefficient that relates to the reliability of a test (eg, correlation between 2 forms of test, or test-retest, or between odd & even items, etc). Corr. coefficients range between +/- 1, but reliability can only go as low as zero. A neg. corr. coeff is stated as "zero" reliability.

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Education Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile