Shared Flashcard Set

Details

Educational Measurement
Validity, Reliability, and Interpretation of test scores
38
Education
Undergraduate 1
06/27/2009

Additional Education Flashcards

 


 

Cards

Term
assessment:
definition and purpose
Definition
any of a variety of ways to look at performance.
how well does the individual perform?
Term
test:
definition and purpose
Definition
instrument OR systematic procedure with uniform questions to sample behavior
how well does the individual perform?
can have NRT or CRT framework
Term
latent variable
Definition
a line that represents a construct
ex: knowledge of chemical properties of bases
Term
content standards v. performance standards
Definition
"what do students need to know?" v.
"how good is good enough?" (judgement! - cut score)
Term
4 parts of assessment procedure
Definition
Establish the: nature (max/typical), form (MC/construc/perf), use (plc/form/sum/diag), and method of interpreting (CRT v. NRT) the assessment
Term
natures of assessment
Definition
this is one part of the assessment procedure. Are you looking for "maximum/can do" or "typical/will do" performance? implied assessment type: achievement test vs. surveys/obervation
Term
illustrative assessments for measuring "max performance" vs. "typical performance"
Definition
max -> achievement or aptitude test
typical -> attitude surveys, observations
these categories are examples of deciding the "nature" of assessment
Term
forms of assessment
Definition
one part of the assessment procedure. Forms include:
MC, constructed response, performance task
Term
uses of assessment
Definition
one part of the assessment procedure. Uses include:
placement, formative, diagnostic, summative
Term
compare assessment types: placement, formative, diagnostic, summative (see p. 41 table 2.1)
Definition
placement and summative are higher stakes. formative is FYI, correction & reinforcement
diagnostic determines causes of struggle
placement can be just for goals/modality
Term
questions we are asking with assessment
Definition
what do students know?
what are they able to do?
what do they need to do next?
Term
methods of assessment (hint)
Definition
hint: methods of interpreting
CRT vs. NRT
Term
CRT
Definition
Criterion referenced test - no details yet
Term
NRT
Definition
norm-referenced test. No details yet
Term
central policy issues
Definition
DIE!
Term
How do test scores become meaningful?
Definition
this is an essential question. answer should address all aspects of validity (how many are there?) and reliability (specify variety of types that might be of interest
Term
How can we use tests to improve education and society?
Definition
another essential question from lecture 2.
answer should have lots of hedged recommendations.
Term
validity:
definition & types
Definition
the degree to which an assessment instrument or procedure can be used/interpreted for a specific purpose (context dependent).
assessment should: (1) cover content it purports to test, (2) correlate with specified, appropriate criteria (3) generate results consistent with implications of stated constructs (difficulty of items; bloom's taxonomy), (4) have consequences that are fair and appropriate.
validity determinations are largely a matter of judgment.
Term
reliability (table 5.1):
definition, types, & methods
Definition
the degree of consistency of the outcomes of an assessment. 5 diff. measures of reliability, + method(s) for each. One might measure (1) stability across time [using test-retest], (2) equivalence [using equivalent forms], (3) BOTH, (4) internal consistency [using 3 ways methods], or (5) conistency of ratings [using interrater methods]

reliability is reported as statistical coefficients (0-1)
Term
validity v. reliability
Definition
analogous to accuracy (i got what I wanted) vs. precision (I got the same result consistently).
with tests, reliability is necessary but not sufficient for validity.
VALID is specific to a particular stated purpose. RELIABLE is specific to a particular "sample" of takers (aka, context, group)
Term
content-related validity
Definition
the degree to which an assessment instrument or procedure covers content it purports to test. 4 steps to establishing: (1) objectives?, (2) know-do blueprint (bloom), (3) make test, (4) judge alignment
Term
procedure for attaining content validity
Definition
(1) identify objectives/goals, (2) build table of specs (KNOW content , DO bloom), (3) construct test, (4) panel to evaluate alignment
Term
criterion-related validity
Definition
Measure of scores' correlation to an “appropriate” criterion, which may be Concurrent (eg current GPA) OR predictive (eg future GPA). Although this aspect of validity involves correlation coefficient, judgment is still required to decide what degree of correlation is good enough.
Term
construct-related validity
Definition
the degree to which an assessment generates results that are consistent with the implications of stated constructs (difficulty of items; bloom's taxonomy); when you PROPOSE that an item fits a specific construct (eg, this is a comprehension question - it’s easy) then that construct implies the sorts of scores you should get (HIGH). If evidence (scores) fit that prediction, then your proposed construct interpretation is valid.
Term
consequential validity
Definition
the degree to which an assessment instrument or procedure (including interpretation) has consequences that are fair and appropriate.
Term
test-retest
Definition
measure of stability of test scores over time (one type of reliability)
Term
equivalent forms
Definition
measure of stability of test scores from different versions of test (one type of reliability)
Term
split-half
Definition
measure of stability of test scores from halves of items within a single test. Requires use of spearman-brown formula
Term
KR20
Definition
KR20 and KR21 and Cronbach's alpha coefficient are calculations that measure the internal consistency of a single test, which is one measure of reliability
Term
interrater methods
Definition
ways to measure the stability scores of the same test given by different raters. This is one type of reliability.
Term
ways that tests may be consistent or not, aka, sources of variation (table 5.4)
Definition
1. testing procedure (use any method except interrater)
2. student "characteristics"/response (use interval)
3. sample items (use eq forms or internal consist)
4. judgmental scores (use interraters)
Term
consistency in testing procedure
Definition
part of reliability; inconsistency will be detected by all methods of reliability estimation EXCEPT interrater
Term
consistency in student characteristics (how kids respond to test)
Definition
part of reliability; inconsistency will be detected by any time interval method, and to a less useful extent by test-retest
Term
consistency over diff. samples of items
Definition
part of reliability; inconsistency will be detected by test-retest OR internal consistency methods
Term
internal consistency
Definition
one aspect of reliability, it can be measured by split-half method (which requires spearman-brown formula), KR20 and KR21 and Cronbach's alpha coefficient (remember "generalizability theory" too?)
Term
SEM
Definition
Standard Error of measurement (need to know formula?). To get range of likely values for a student's "true" score, add a "confidence band" of +/- 1 SEM around every score for a category/domain.

SEM is determined by a student's standard deviation AND the test's reliability coefficient (table 5.6)
Term
table of specifications
Definition
2nd step of process for establishing content validity. Table can separate out both both content and construct (bloom's taxonomy) . In 4th step, items from test are placed into "cells" in the spec. table to see if they are distributed in the way you intended
Term
reliability coefficient
Definition
A correlation coefficient that relates to the reliability of a test (eg, correlation between 2 forms of test, or test-retest, or between odd & even items, etc). Corr. coefficients range between +/- 1, but reliability can only go as low as zero. A neg. corr. coeff is stated as "zero" reliability.
Supporting users have an ad free experience!