Term
assessment: definition and purpose |
|
Definition
any of a variety of ways to look at performance. how well does the individual perform? |
|
|
Term
test: definition and purpose |
|
Definition
instrument OR systematic procedure with uniform questions to sample behavior how well does the individual perform? can have NRT or CRT framework |
|
|
Term
|
Definition
a line that represents a construct ex: knowledge of chemical properties of bases |
|
|
Term
| content standards v. performance standards |
|
Definition
"what do students need to know?" v. "how good is good enough?" (judgement! - cut score) |
|
|
Term
| 4 parts of assessment procedure |
|
Definition
| Establish the: nature (max/typical), form (MC/construc/perf), use (plc/form/sum/diag), and method of interpreting (CRT v. NRT) the assessment |
|
|
Term
|
Definition
| this is one part of the assessment procedure. Are you looking for "maximum/can do" or "typical/will do" performance? implied assessment type: achievement test vs. surveys/obervation |
|
|
Term
| illustrative assessments for measuring "max performance" vs. "typical performance" |
|
Definition
max -> achievement or aptitude test typical -> attitude surveys, observations these categories are examples of deciding the "nature" of assessment |
|
|
Term
|
Definition
one part of the assessment procedure. Forms include: MC, constructed response, performance task |
|
|
Term
|
Definition
one part of the assessment procedure. Uses include: placement, formative, diagnostic, summative |
|
|
Term
| compare assessment types: placement, formative, diagnostic, summative (see p. 41 table 2.1) |
|
Definition
placement and summative are higher stakes. formative is FYI, correction & reinforcement diagnostic determines causes of struggle placement can be just for goals/modality |
|
|
Term
| questions we are asking with assessment |
|
Definition
what do students know? what are they able to do? what do they need to do next? |
|
|
Term
| methods of assessment (hint) |
|
Definition
hint: methods of interpreting CRT vs. NRT |
|
|
Term
|
Definition
| Criterion referenced test - no details yet |
|
|
Term
|
Definition
| norm-referenced test. No details yet |
|
|
Term
|
Definition
|
|
Term
| How do test scores become meaningful? |
|
Definition
| this is an essential question. answer should address all aspects of validity (how many are there?) and reliability (specify variety of types that might be of interest |
|
|
Term
| How can we use tests to improve education and society? |
|
Definition
another essential question from lecture 2. answer should have lots of hedged recommendations. |
|
|
Term
validity: definition & types |
|
Definition
the degree to which an assessment instrument or procedure can be used/interpreted for a specific purpose (context dependent). assessment should: (1) cover content it purports to test, (2) correlate with specified, appropriate criteria (3) generate results consistent with implications of stated constructs (difficulty of items; bloom's taxonomy), (4) have consequences that are fair and appropriate. validity determinations are largely a matter of judgment. |
|
|
Term
reliability (table 5.1): definition, types, & methods |
|
Definition
the degree of consistency of the outcomes of an assessment. 5 diff. measures of reliability, + method(s) for each. One might measure (1) stability across time [using test-retest], (2) equivalence [using equivalent forms], (3) BOTH, (4) internal consistency [using 3 ways methods], or (5) conistency of ratings [using interrater methods]
reliability is reported as statistical coefficients (0-1) |
|
|
Term
|
Definition
analogous to accuracy (i got what I wanted) vs. precision (I got the same result consistently). with tests, reliability is necessary but not sufficient for validity. VALID is specific to a particular stated purpose. RELIABLE is specific to a particular "sample" of takers (aka, context, group) |
|
|
Term
|
Definition
| the degree to which an assessment instrument or procedure covers content it purports to test. 4 steps to establishing: (1) objectives?, (2) know-do blueprint (bloom), (3) make test, (4) judge alignment |
|
|
Term
| procedure for attaining content validity |
|
Definition
| (1) identify objectives/goals, (2) build table of specs (KNOW content , DO bloom), (3) construct test, (4) panel to evaluate alignment |
|
|
Term
| criterion-related validity |
|
Definition
| Measure of scores' correlation to an “appropriate” criterion, which may be Concurrent (eg current GPA) OR predictive (eg future GPA). Although this aspect of validity involves correlation coefficient, judgment is still required to decide what degree of correlation is good enough. |
|
|
Term
| construct-related validity |
|
Definition
| the degree to which an assessment generates results that are consistent with the implications of stated constructs (difficulty of items; bloom's taxonomy); when you PROPOSE that an item fits a specific construct (eg, this is a comprehension question - it’s easy) then that construct implies the sorts of scores you should get (HIGH). If evidence (scores) fit that prediction, then your proposed construct interpretation is valid. |
|
|
Term
|
Definition
| the degree to which an assessment instrument or procedure (including interpretation) has consequences that are fair and appropriate. |
|
|
Term
|
Definition
| measure of stability of test scores over time (one type of reliability) |
|
|
Term
|
Definition
| measure of stability of test scores from different versions of test (one type of reliability) |
|
|
Term
|
Definition
| measure of stability of test scores from halves of items within a single test. Requires use of spearman-brown formula |
|
|
Term
|
Definition
| KR20 and KR21 and Cronbach's alpha coefficient are calculations that measure the internal consistency of a single test, which is one measure of reliability |
|
|
Term
|
Definition
| ways to measure the stability scores of the same test given by different raters. This is one type of reliability. |
|
|
Term
| ways that tests may be consistent or not, aka, sources of variation (table 5.4) |
|
Definition
1. testing procedure (use any method except interrater) 2. student "characteristics"/response (use interval) 3. sample items (use eq forms or internal consist) 4. judgmental scores (use interraters) |
|
|
Term
| consistency in testing procedure |
|
Definition
| part of reliability; inconsistency will be detected by all methods of reliability estimation EXCEPT interrater |
|
|
Term
| consistency in student characteristics (how kids respond to test) |
|
Definition
| part of reliability; inconsistency will be detected by any time interval method, and to a less useful extent by test-retest |
|
|
Term
| consistency over diff. samples of items |
|
Definition
| part of reliability; inconsistency will be detected by test-retest OR internal consistency methods |
|
|
Term
|
Definition
| one aspect of reliability, it can be measured by split-half method (which requires spearman-brown formula), KR20 and KR21 and Cronbach's alpha coefficient (remember "generalizability theory" too?) |
|
|
Term
|
Definition
Standard Error of measurement (need to know formula?). To get range of likely values for a student's "true" score, add a "confidence band" of +/- 1 SEM around every score for a category/domain.
SEM is determined by a student's standard deviation AND the test's reliability coefficient (table 5.6) |
|
|
Term
|
Definition
| 2nd step of process for establishing content validity. Table can separate out both both content and construct (bloom's taxonomy) . In 4th step, items from test are placed into "cells" in the spec. table to see if they are distributed in the way you intended |
|
|
Term
|
Definition
| A correlation coefficient that relates to the reliability of a test (eg, correlation between 2 forms of test, or test-retest, or between odd & even items, etc). Corr. coefficients range between +/- 1, but reliability can only go as low as zero. A neg. corr. coeff is stated as "zero" reliability. |
|
|