Term
| 4 points of Meier: Why study measurement? |
|
Definition
| (1) Grads don't have good understanding, (2) Lack of appreciation for importance in advancement of science, (3) Overemphasize home-grown scales and unknown psychometric properties, and (4) Need more instruction on how to evaluate/report psychometric properties |
|
|
Term
| Complete quote: "Anything that exists in an amount can be measured and..." |
|
Definition
| "...anything that can be measured can be measured inaccurately." |
|
|
Term
| In psychology, we typically study |
|
Definition
|
|
Term
| What are hypothetical constructs |
|
Definition
| Abstract concepts used to relate different behaviors according to their underlying features or causes. |
|
|
Term
| What is the key feature of hypothetical constructs? |
|
Definition
|
|
Term
| Constructs are assumed to exist based on |
|
Definition
|
|
Term
| Behaviors and constructs have a ______ relationship |
|
Definition
|
|
Term
|
Definition
| Consists of effect indicators. In the example of depression, these would be weight fluctuation, crying spells, suicidality, etc. |
|
|
Term
|
Definition
| Consists of cause indicators. These influence the level of a construct. In the example of graduate school performance, an index would assess for parental income, organizational skills, motivation, value of higher education, etc. |
|
|
Term
|
Definition
| Presumed to "cause" an item score in whole or in part. These are the hypothetical constructs of interest. |
|
|
Term
| We are less interested in ______ and more interested in ______ |
|
Definition
|
|
Term
| "A well-designed study cannot make up for..." |
|
Definition
| "...problematic measurement." |
|
|
Term
| The item value should be correlated with... |
|
Definition
| True score of the construct |
|
|
Term
| True or false: True scores are measured. |
|
Definition
| False: true scores cannot be measured because they do not take into account error or extraneous variables/factors. |
|
|
Term
|
Definition
| Scores on a measure should co-vary with the true score. |
|
|
Term
|
Definition
| The difference between the variable (true score) and how it is represented in measurement (observed score). Variance due to the operation of the latent variable. |
|
|
Term
|
Definition
| True score (of the latent variable) + measurement error |
|
|
Term
|
Definition
| Residual variance that may be random or systematic |
|
|
Term
| 3 assumptions regarding error in classical measurement theory (CMT) |
|
Definition
| (1) Amount of error associated with individual items varies randomly, (2) Error terms are uncorrelated across items, and (3) Error terms are uncorrelated with the true score of the latent variable. |
|
|
Term
| Standardized path coefficients |
|
Definition
| Show a relationship between the latent construct and the variable of interest by showing the strength of the construct. A good path diagram of a latent variable will have path coefficients that are high (close to 1). |
|
|
Term
| Item correlations allow for |
|
Definition
| Estimation of path coefficients |
|
|
Term
| The best index of true score is |
|
Definition
|
|
Term
| The cross-products of path coefficients are |
|
Definition
| The correlations between items. |
|
|
Term
| Parallel and strictly parallel assumptions of CMT |
|
Definition
| (1) The amount of influence from the latent construct to each item is the same, and (2) Each item is assumed to have the same amount of error (under this assumption, the path coefficients from the construct to all items should be the same). |
|
|
Term
| Congeneric model (Joreskog, 1971) |
|
Definition
| All items share a common latent variable, latent variables need not exert the same strength of relationship to all items (path coefficients may be different/unequal), and error variables need not be equal. The latter two occur when relaxing the assumptions, which also allows you to see the latent variable influence on indicators. |
|
|
Term
|
Definition
Most common used technique, likelihood that observed correlations are drawn from a population that is the same/similar to your sample.
- Assumes large N (few hundred)
- Indicates multivariate, normal distribution
- Individual variables are normally distributed
- Indicates continuous variables
- Chooses estimates that have greatest chance of reproducing obtained data (corr. matrix)
|
|
|
Term
| Non-moderator variables = |
|
Definition
|
|
Term
| Moderator variables can be |
|
Definition
|
|
Term
| Relationship between item-total correlations and standardized path coefficients |
|
Definition
| Item total correlations (relationship between item and scale) allow for estimation of path coefficients. |
|
|
Term
| Problem with item-total correlations |
|
Definition
| The total/scale is influenced by the item to which it is being compared. |
|
|
Term
| If the item-total correlation for an item is less than alpha, then |
|
Definition
|
|
Term
| How can interrelationships among test items be empirically assessed? |
|
Definition
| Through reliability testing (coefficient alpha/internal consistency). Covariance matrices are also used. |
|
|
Term
| What is the difference between a variance-covariance matrix and a correlation matrix? |
|
Definition
| A correlation matrix is standardized, whereas a variance-covariance matrix is not and maintains the original scaling of the items. |
|
|
Term
| What is a variance-covariance matrix? |
|
Definition
| A matrix that represents the score variations of variables (variances), pairs of variables (covariances), and the entire dataset. |
|
|
Term
| What do the diagonals and the off-diagonals in a variance-covariance matrix represent? |
|
Definition
| The diagonals are the items variances and the off-diagonals are the item covariances. |
|
|
Term
| What is coefficient alpha? |
|
Definition
| Coefficient alpha is concerned with the homogeneity of items within a scale. It shows the proportion of a scale's total variance that is attributable to a common source of a latent variable underlying the items. The higher the value of coefficient alpha, the more the items are related to each other. |
|
|
Term
| What is communal variation? |
|
Definition
| The diagonal items in the covariance matrix. |
|
|
Term
| What is non-communal variation? |
|
Definition
| The off-diagonals in the covariance matrix. |
|
|
Term
| What is the relationship between coefficient alpha, communal variation, and non-communal variation? |
|
Definition
| Coefficient alpha determines how related the items (communal and non-communal) are to each other. |
|
|
Term
|
Definition
| The consistency of measurement across time or items. Devellis: "...proportion of variance attributable to the true score of the latent variable." |
|
|
Term
| What are some methods for determining reliability? |
|
Definition
| Internal consistency, temporal stability, split-half reliability, and other forms of reliability that require a parallel version of the instrument. |
|
|
Term
| What is internal consistency? |
|
Definition
| Coefficient alpha. It is the percent agreement per chance removed. |
|
|
Term
| What is an example of temporal stability? |
|
Definition
|
|
Term
| What is split-half reliability? |
|
Definition
| The first half of the measurement is related to the second half. It is similar to coefficient kappa in that you take the average of all the split halves and report this or kappa. |
|
|
Term
| Schmitt (1996): "Alpha is not a good measure of homogeneity, rather it is a good measure of _____." |
|
Definition
|
|
Term
| Variability in a set of scores is due to... |
|
Definition
| Actual variation across individuals, error, or alpha. |
|
|
Term
|
Definition
| Actual variation across individuals. |
|
|
Term
|
Definition
|
|
Term
|
Definition
| Alpha or internal consistency reliability. |
|
|
Term
|
Definition
| The variance for the measure as a whole. |
|
|
Term
| In order to compute alpha, we need... |
|
Definition
| Estimates of total variance and common variance. |
|
|
Term
|
Definition
| C, which is the sum of all matrix components. |
|
|
Term
| Two assumptions when calculating alpha |
|
Definition
| (1) Items in a scale are correlated because they are all affected by the same latent variable (Y), and (2) Error terms are uncorrelated and reflect the unique variation each item possesses. |
|
|
Term
| As a measure of reliability and as a ratio, coefficient alpha is... |
|
Definition
| Common-source variation to total variation in scores. |
|
|
Term
| What influences item score and scale score? |
|
Definition
| (1) Source variance that is common to itself and other items, and (2) Unique unshared variance (error). |
|
|
Term
| The variance of a k-item scale (k being group) equals |
|
Definition
| The sum of all the matrix items. |
|
|
Term
| The sum of the elements along the main diagonal equals |
|
Definition
| The sum of the variances of individual items. |
|
|
Term
| The sum of the matrix elements equals |
|
Definition
| The total variance for the scale. |
|
|
Term
|
Definition
| Variability equals changes in the value of the common source. |
|
|
Term
|
Definition
| Variability in items is not attributed to the common sources. |
|
|
Term
| As alpha goes up, consistency goes |
|
Definition
|
|
Term
| Why is use of the term homogeneous (when referring to alpha) controversial? |
|
Definition
| (1) You can have a high alpha coefficient with more than one source of variance, and (2) Coefficient alpha is sensitive to the length of the scale (with higher alphas for longer scales). |
|
|
Term
| The cutoff for high alpha is |
|
Definition
|
|
Term
| The cutoff for temporal stability is |
|
Definition
|
|
Term
|
Definition
|
|
Term
| Temporal stability ranges from |
|
Definition
|
|
Term
| Assumptions of temporal stability |
|
Definition
| (1) Variable being measured is stable, and (2) Error is not associated with time of measurement (error is assumed to be constant across time). |
|
|
Term
| Concerns surrounding test-retest reliability |
|
Definition
| (1) Accurate measurement of variable behavior/trait will result in low test-retest reliability, (2) Insensitive measure of volatile trait may yield artificially high test-retest reliability, (3) Third variable may moderate expression of trait, (4) Maturation effects, (5) Systematic oscillation, and (6) Fatigue effects |
|
|
Term
| Interobserver agreement (IOA) is |
|
Definition
|
|
Term
|
Definition
| A ratio of agreements to disagreements in relation to the expected frequencies. |
|
|
Term
|
Definition
| Percent agreement per chance removed. |
|
|
Term
|
Definition
| K = [Pr(α)-Pr(e)]/[1-Pr(e)], with Pr being the percent agreement. |
|
|
Term
| Validity coefficients are constrained by |
|
Definition
|
|
Term
| Increased reliability means increased |
|
Definition
|
|
Term
|
Definition
| Ability to detect mean differences between conditions. |
|
|
Term
|
Definition
| "Does the test measure what it purports to measure?" (Anatasi). "Are the scores produced by an instrument or assessment procedure meaningful?" (Cronbach & Meehl, 1955). |
|
|
Term
| What is the relationship between reliability and validity? |
|
Definition
| Reliability and validity are co-dependent because in order for a test to measure what it should measure, then the variables should be attributable to the true score of the latent variable (and vice versa). |
|
|
Term
| When developing a test, you should review the literature and define the construct of interest by... |
|
Definition
| Comprehensively searching appropriate databases, consulting experts in the area, consulting compendiums designed to list measures of limited distribution, and including behavior excesses/deficits, skills/weaknesses, and traits. |
|
|
Term
| When developing a test, the test planning and layout should consist of... |
|
Definition
| Selection of a representativre sample (one that has the same levels of the relevant characteristics of the population to which you want to generalize), behavior domain, behavior sample, and table of specifications (lists processes, content, and number of items per cell) |
|
|
Term
| What is a behavior domain? |
|
Definition
| The entire range of behaviors the test is supposed to measure. |
|
|
Term
| What is a behavior sample? |
|
Definition
| All the behaviors thei tems actually cover (should represent behavior domain). |
|
|
Term
| When developing a test, designing the test should include which processes? |
|
Definition
| Write instructions, every section of the test should begin with a new set of instructions, consider including a demographic section (unless it would impair performance), develop a test manual, ensure standardization of administration, and choose item type. |
|
|
Term
| When developing/designing a test, what should your test manual include? |
|
Definition
| Brief directions for administration, scoring, and interpretation of the test; appropriate use of the test; psychometric properties; and appropriate populations. |
|
|
Term
| When developing/designing a test, what would be some good characteristics of your instructions? |
|
Definition
| Brief, simple, clear, free of modifying clauses, and not offensive to any particular group. |
|
|
Term
| In developing/designing a test, what are some things to consider when choosing your item type? |
|
Definition
| Free response versus fixed response, timed versus untimed, avoid multiple item types, consider item difficulty/attractiveness, have colleagues review items for clarity, items should avoid difficult words, items should be grammatically simple, items should be emotionally neutral, and you initial want to have 1.5-2 times the number of items you hope to ultimately end up with. |
|
|
Term
| What are some examples of free response and fixed response items? |
|
Definition
Free response: How many times have you cried this week? _______________.
Fixed response: How many times have you cried this week?
(a) None
(b) Once
(c) 2-3 times
(d) 4 or more times |
|
|
Term
| What is a rule of thumb when using timed tests? |
|
Definition
| 90% of people should complete the test within the limit. You should also have separate limits for each section. |
|
|
Term
| When developing a test, why should you avoid multiple item types? |
|
Definition
| Having multiple item types complicates item analysis and interpretation. |
|
|
Term
| Define item attractiveness. |
|
Definition
| Asking a question in such a way that it increases the likelihood of eliciting a positive response. |
|
|
Term
| When developing a test, what are some key features you should look for when trying out your items on a sample? |
|
Definition
| Select a sample of sufficient size to reduce undue influence of outliers and a sample that is similar to the target population in relevant characteristics. Should have N 50-500, possibly oversampling, and be aware of things like missing data. |
|
|
Term
| What should be included in your consent form? |
|
Definition
| Types of tests, reason(s) for testing, intended use of results, consequences of that use, who will receive the results. |
|
|
Term
| What is the key thing you'll be looking for in item analysis? |
|
Definition
| Discriminating power - being able to distinguish between persons high and low on the characteristic being measured. |
|
|
Term
|
Definition
| Looks at the ability of an individual item to determine high or low scores (you don't want everyone to get the same answer and you want a range of difficulties to tap all levels of the construct). |
|
|
Term
|
Definition
| Proportion of test takes (p) who give the keyed response. You want this to be ~.5 (unless dichotomous), but you should also have a range of item difficulties. |
|
|
Term
| Item discrimination index |
|
Definition
| Difference between the number of high and low scorers who get an item correct (or the number of persons with and without a disorder who endorse an item in the keyed direction). |
|
|
Term
| Formula for calculating item discrimination index |
|
Definition
Di (item discrimination index) = Nhi/Nh - Nli/Nl
Nhi = number of persons in the high scoring group who passed item i
Nh = number of persons in the high scoring group
Nli = number of persons in the low scoring group who passed item i
Nl = number of persons in the low scoring group |
|
|
Term
| When building a scale, you want the following characteristics |
|
Definition
| Items with acceptable statistics, items with marginal statistics, statistically weak items that will be eliminated, use of factor analysis to make decisions (exploratory). |
|
|
Term
| How do you standardize a test? |
|
Definition
| Identify a standardization sample, compute reliability indices (consistency over time, conditions, scorers, forms, and internal consistency), and compute validity indices (convergent and divergent) |
|
|
Term
| Key point of item response theory when developing a test |
|
Definition
| Items assess different levels of some construct. |
|
|
Term
|
Definition
| Degree to which elements of an assessment instrument are relevant to and representative of the targeted construct for the assessment purpose (Haynes et al, 1998). Determines whether the test is made up of stimuli calling for construct-relevant responses. The score on the measure is meant to describe the behavior in its own right, not as a sign of some abstraction or construct (Foster & Cone, 1995). |
|
|
Term
| Which type of validity concerns sampling adequacy? |
|
Definition
|
|
Term
| Two features of content validity |
|
Definition
| (1) Easiest to evaluate when the domain of interest is well-defined, and (2) A scale will have high content validity when its items constitute a randomly selected subset of items drawn from the universe of items. |
|
|
Term
| You don't want your items to |
|
Definition
|
|
Term
| Content validity is compromised if... |
|
Definition
| (1) Items reflecting any important facets are omitted, (2) Items measuring facets outside the domain are included, and (3) Aggregate score is disproportionately influenced by any one facet. |
|
|
Term
| Questions to ask when generating items for each facet. |
|
Definition
| How many items? Will/Should some facets be overrepresented (items must bear relationship to facet demarcations)? How do you know if the item truly taps into the facet componentf or which it was designed? What if item content overlaps? What is the criterion for deciding if items are too similar? What if facets overlap too much? How would you know? |
|
|
Term
| Content validation guidelines |
|
Definition
| Define domain/facets of construct and subject to content validation before developing other elements of measure, subject all elements of measure to content validation, use population/expert sampling for initial generation of items/elements, use multiple judges of content validity and quantify using formalized scaling procedures, examine proportional representation of items, report results of content validation when reporting development of new scale, and use subsequence analyses for instrument refinement. |
|
|
Term
| Dimensions of formalized scaling procedures |
|
Definition
| Relevance, specificity, clarity, representativeness. |
|
|
Term
| What is criterion validity? |
|
Definition
| An item or scale has an empirical association with some criterion or gold standard. Also called predictive validity. |
|
|
Term
| What are some characteristics of criterion validity? |
|
Definition
| Not driven by theory, practical, and not essential that the criterion follows the test administration. |
|
|
Term
| What are some types of criterion validity? |
|
Definition
| Predictive criterion validity, postdictive criterion validity, and concurrent criterion validity. |
|
|
Term
| Distinction between criterion validity and accuracy. |
|
Definition
| Correlation coefficienst may not be good indices of criterion validity because correlation cannot tell you how many did or did not succeed nor the hit rate. |
|
|
Term
| Define construct validity |
|
Definition
| Directly concerned with the theoretical relationship of a variable (test score) to other variables. Put another way, it is the extent to which a measure accurately reflects the way that the construct it purports to measure should behave relative to measures of other related (and unrelated) constructs (Cronbach & Meehl, 1955). |
|
|
Term
| Methods for establishing construct validity |
|
Definition
| (1) Known-groups method (determines if you measure adequately distinguishes between different groups), and (2) Multimethod-multitrait matrix (fully crossed method-by-measure matrix that allows you to disentangle the variance attributable to effects of method of assessment and construct variance). |
|
|
Term
| Multimethod-multitrait matrix components |
|
Definition
| Heteromethod blocks, heterotrait-heteromethod triangles, validity diagonals, heterotrait-monomethod triangles, reliability diagonals, and monomethod blocks. |
|
|
Term
|
Definition
| Relationship between all 3 traits when manipulating method of assessment. |
|
|
Term
| Heterotrait-heteromethod triangles |
|
Definition
|
|
Term
|
Definition
| Relationship among similar traits using different methods - strongly related - low variability from measurement type. |
|
|
Term
| Heterotrait-monomethod triangles |
|
Definition
| Different trait, same method. |
|
|
Term
|
Definition
| Relationship between measure and itself using same method (test-retest or internal consistency) |
|
|
Term
|
Definition
| Relationship between similar and dissimilar traits holding the method constant. |
|
|
Term
| Hypothesized relational magnitudes of the multimethod-multitrait matrix |
|
Definition
| (In order of strongest to weakest) Same trait, same method; same trait, different method; and different trait, different method. |
|
|
Term
| Elements of construct validity |
|
Definition
| Covergent validity and discriminant validity |
|
|
Term
|
Definition
| Measures that assess the same construct are highly correlated. |
|
|
Term
|
Definition
| Measures that assess different (or unrelated) constructs should be uncorrelated. |
|
|
Term
| How do you determine convergent validity? |
|
Definition
| Obtain scores on the new measure for a group of persons and scores on an independent measure of the same latent construct and correlate them. High correlation supports convergence and it is recommended that convergence be shown between independent approaches that are maximally different. |
|
|
Term
|
Definition
| The alternative to the correlation coefficient when determining criterion validity. This type of theory makes fine-grained distinctions between signal and noise. |
|
|
Term
| Typical signal detection procedure |
|
Definition
| Administer your questionnaire and the gold standard and use the gold standard to diagnose each respondent. Use your measure to dichotomize the sample into test positives or test negatives. Then, construct a 2X2 matrix to compare the questionnaire and gold standard in term of results. |
|
|
Term
| Goal of 2X2 signal detection matrix |
|
Definition
| Maximize true positives and true negatives. |
|
|
Term
|
Definition
| Positive test result (your test) and present diagonsis (gold standard) |
|
|
Term
| False positive or false alarm |
|
Definition
| Positive test result (your test) and absent diagonsis (gold standard). |
|
|
Term
|
Definition
| Negative test result (your test) and present diagonsis (gold standard). |
|
|
Term
|
Definition
| Negative test result (your test) and absent diagnosis (gold standard). |
|
|
Term
|
Definition
The probability of having a positive test result among those patients who have the disorder.
Ratio of hits to hits and misses (FN) |
|
|
Term
|
Definition
The probability of having a negative test result among those patients who do not have the disorder.
Ratio of true negatives to true negatives and false alarms (FP). |
|
|
Term
| Define positive predictive power. |
|
Definition
The probability of having teh disorder among those with a positive test result.
Ratio of hits to hits and false alarms (FP). |
|
|
Term
| Negative predictive power |
|
Definition
The probability of not having the disorder among those with a negative test result.
Ratio of true negatives to true negatives and misses (FN). |
|
|
Term
|
Definition
| % observed agreement X % chance agreement / 100% - chance agreement. |
|
|
Term
| What is the range of kappa? What are good values? |
|
Definition
Kappa ranges from 0 to 1.
< .2 = poor agreement
.2-.4 = fair agreement
.4-.6 = moderate agreement
.6-.8 = good agreement
> .8 = excellent agreement |
|
|
Term
| What is the formula for prevalence? |
|
Definition
|
|
Term
| What is the formula for the level of a test? |
|
Definition
|
|
Term
| What is the formula for sensitivity? |
|
Definition
|
|
Term
| What is the formula for specificity? |
|
Definition
|
|
Term
| What is the formula for efficiency? |
|
Definition
|
|
Term
| What is the formula for positive predictive power? |
|
Definition
|
|
Term
| What is the formula for negative predictive power? |
|
Definition
|
|
Term
| What is the formula for the standard error of prevalence? |
|
Definition
|
|
Term
| What is the formula for the standard error of true positives? |
|
Definition
| (TP^) = (TP(1-TP')/N0)1/2 |
|
|
Term
| What is the formula for the standard error of false negatives? |
|
Definition
| (FN^) = (FN(1-FN')/N0)1/2 |
|
|
Term
| What is the formula for the standard error of false positives? |
|
Definition
| (FP^) = (FP(1-FP')/N0)1/2 |
|
|
Term
| What is the formula for the standard error of true negatives? |
|
Definition
| (TN^) = (TN(1-TN')/N0)1/2 |
|
|
Term
| What is the formula for standard error of efficiency? |
|
Definition
| (EFF^) = (EFF(1-EFF')/N0)1/2 |
|
|
Term
| What is the formula for the standard error of the level of a test? |
|
Definition
|
|
Term
| True or false: Standard error (hat) is a biased estimator standard error. |
|
Definition
| True, SE^ underpredicts SE (a large N may counter this). |
|
|
Term
| Which of the following requires a larger sample size: low-risk population or high-risk population? |
|
Definition
| Low-risk populations because the test must be able to pick up on the signal of a diagnosis and if a population does not have as many true positives/hits, then it will become more difficult for a test to pick up on the hits unless there is a larger N (more hits). |
|
|
Term
| Minimum of __ people in each marginal of the 2X2 table yields unbiased estimators. |
|
Definition
|
|
Term
| What are the 3 types of sampling strategies? |
|
Definition
| Naturalistic sampling, retrospective sampling, and prospective sampling. |
|
|
Term
| What is naturalistic sampling? |
|
Definition
| A sampling strategy in which the evaluator decides how large a sample to gather (N) and takes a random (or representative) sample of that size from the population of interest - each patient receives a diagnosis and a test. |
|
|
Term
| What is retrospective sampling? |
|
Definition
| A sampling strategy in which a representative sample is drawn from the population and each patient is diagnosed. This is called the screening sample. Then, another random sample is drawn from among those in the screening sample with a positive diagnosis and another random sample from among those with a negative diagnosis. Note: each N must have a minimum of 10 people. |
|
|
Term
|
Definition
| A sampling strategy in which a representative sample of patients is drawn from the population and the "screening sample" and each patient is tested. These two groups then receive a diagnosis. The proportion of patients with a positive test provides an unbiased estimator of the level of the test, Q. |
|
|
Term
| True or false: retrospective sampling will generally yield a less powerful test than naturalistic sampling. |
|
Definition
| False: retrospective sampling will generally yield a MORE powerful test than naturalistic sampling. |
|
|
Term
| For a random test in which p = 0, sensitivity equals |
|
Definition
|
|
Term
| For a random test in which p = 0, specificity equals |
|
Definition
| The compliment of the level of the test. |
|
|
Term
| For a legitimate test in which p > 0, sensitivity ______ the level of the test |
|
Definition
|
|
Term
| For a legitimate test in which p > 0, specificity ______ the complement of the level of the test. |
|
Definition
|
|
Term
| Ideal value for sensitivity and specificity is ___. The range is ___ to ___ and ___ to ___, respectively. |
|
Definition
|
|
Term
|
Definition
|
|
Term
| Any report of a test that does not report its level gives no indication of |
|
Definition
|
|
Term
|
Definition
| A graphical representation of the relationship between sensitivity and specificity across different test cutoff points. Your goal is to select the test cutoff point that maximizes sensitivity and specificity. |
|
|
Term
|
Definition
| A graph showing the conditional probability of choosing alternative A when the alternative occurs (hit) plotted against the conditional probability of choosing alternative A when alternative B occurs (false alarm). |
|
|
Term
| If the point is at or under the ROC curve, it is a ____ test; if the point is above the ROC curve, it is a ____ test. |
|
Definition
|
|
Term
| What is the relationship between sensitivity and specificity? |
|
Definition
| As sensitivity increases, specificity decreases and as specificity increases, sensitivity decreases; they are negatively correlated. |
|
|
Term
| Some things to consider in incremental validity: |
|
Definition
Does the instrument:
Predict the phenomenon more validly/accurately?
Contribute meaningfully to predictive efficiency when added to an existing/readily obtainable measure?
Cost less than other measures? |
|
|
Term
| What is incremental validity? |
|
Definition
| The degree to which a measure explains or predicts some phenomena of interest relative to other measures? Note: it is defined relative to other measures, which distinguishes it from other forms of validity, which are typically defined in absolute terms. |
|
|
Term
| Predictive power and incremental validity are a function of |
|
Definition
| How accurate other measures are. |
|
|
Term
| Implications of the definition of incremental validity |
|
Definition
| Multiple dimensions, inferences are dependent on the comparison measure(s) used, dependent on mode of assessment, dependent on criterion, inferences vary across target populations and samples, and it is conditional. |
|
|
Term
| In order to select the most appropriate dimension for study through incremental validity, you must |
|
Definition
| Determine how the measure will be used, select criteria on which validity inferences will be made, selective alternative/comparison measures, and identify the population. |
|
|
Term
|
Definition
What is the relative cost of acquiring new data compared with data form comparison measures? Is it cost effective? Does the ration of costs to benefits of the new measure, relative to others, warrant its use?
Costs/benefits can be defined in terms of money, time, consequences of incorrect decisions, and each of these may vary as a function of the population. |
|
|
Term
| When to develop a new instrument: |
|
Definition
| Problems with item construction/content, predictive power, sensitivity to change, non-equivalent performance across samples. |
|
|
Term
| You want scores to be sensitive to change in the underlying construct because if they are not sensitive... |
|
Definition
| Changes will occur in one group and not the other. |
|
|
Term
| Things to consider when refining/creating a new measure |
|
Definition
| Item response methods to identify bias or poorly performing items, internal consistency, item-level temporal stability, interrater agreement indices, item factor loadings (aka item-total correlations), proportion of items performing poorly to guide decision, published versus unpublished literature. |
|
|
Term
| Data analysis for examining incremental validity involves the following components: |
|
Definition
| Administration of your measure along with comparison measures and some criterion and creation of a correlation matrix between all variables (look for degree of collinearity/shared variance among preditors and strength of association between each predictor and criterion). |
|
|
Term
| Goals of data analyses for examining incremental validity |
|
Definition
| Estimate relative proportions of variance, estimate unique variance predicted by new instrument, and examine interaction (moderator) effects associated with sex, age, SES, etc. |
|
|
Term
| How do you perform a data analysis for examining incremental validity? |
|
Definition
| Computing a hierarchical linear regression, you look for unique predictive power. First, do a forced step-wise regression, enter comparison measures, enter your measure, and the difference in R2 is the index of incremental validity. |
|
|
Term
| The formula for tolerance is |
|
Definition
| 1-R2 (with R2 being a coefficient of determination). |
|
|
Term
| The formula for the variance inflation factor (VIF) is |
|
Definition
|
|
Term
|
Definition
| Impact of collinearity on all independent and dependent variables in a model. It should be less than 10. |
|
|
Term
| If the magnitude of a relationship differs across groups, then |
|
Definition
| The relationship is moderated by another variable. |
|
|
Term
| In order to examine moderator effects, you need to first |
|
Definition
| Add main effects followed by interaction terms. |
|
|
Term
| Basic elements of a factor analysis are |
|
Definition
| Set of procedures designed to produce a set of correlational data into a smaller set of data/reduce data in correlation matrix to better explain variance among the items, tests whether single factor explains interrelationships between items (initial premise - single factor accounts for pattern of correlations in items). |
|
|
Term
|
Definition
| Begin with correlation matrix, look for patterns of covariation, create a null hypothesis (single factor), sum items (estimate of latent construct), compute item-total correlations (ITC), computer projected inter-item correlations (IIC), subtract projected inter-item correlations and item-total correlations (IIC-ITC). |
|
|
Term
| What are item-total correlations? |
|
Definition
| Proxies for correlation between item and construct. |
|
|
Term
| In a really good model, the inter-item correlations equal |
|
Definition
| The item-total correlations. |
|
|
Term
| The difference between the inter-item correlations and the item-total correlations is |
|
Definition
| The residual variance, from which you can create a residual matrix. |
|
|
Term
| Once you have a residual matrix from the IIC and ITC difference, you |
|
Definition
| Extract the second factor, compute correlations between terms and second latent variable, generate matrix of propose correlations, if the second factor captured all left over covariation, you are done, if not continue until no more factors can be extracted. |
|
|
Term
| Residual matrices are treated like |
|
Definition
|
|
Term
| Purpose of factor analysis |
|
Definition
| Identify latent structure of assessment instrument, item refinement/scale development, relationship to content validity and construct validity of instrument. |
|
|
Term
| Exploratory/Common factor analysis (EFA) procedures |
|
Definition
| Identify underlying dimensions of instrument, factor analytically derived scales represent separate empirically-derived dimensions (subscales), use correlation/covariance matrix to identify subset of like items, analysis produces "factor loadings," condense info from individual terms. |
|
|
Term
| What is exploratory factor analysis? |
|
Definition
| A statistcail method used to identify the underlying dimensions of an instrument. |
|
|
Term
| Goals of exploratory factor analysis |
|
Definition
| Identify factor loadings, refine instrument's content by using loadings to guide item retention, better understand latent construct. |
|
|
Term
| What is confirmatory factor analysis? (CFA) |
|
Definition
| A statistical method that is a more theoretically drive approach and requires the use of a priori hypothesis of some sort. |
|
|
Term
| Procedures of confirmatory factor analysis |
|
Definition
| Hypothesize item relationships in a model, fit data to model using SEM (structural equation modeling), use this information to test theoretical models. |
|
|
Term
| Recommended sample size for confirmatory factor analysis |
|
Definition
| 10:1 ratio. Monte Carlo studies - 5-10:1 ratio. For up to 40 variables, N should be at least 125. |
|
|
Term
| In CFA, factors are first extracted then |
|
Definition
|
|
Term
| What are the two types of factor rotation? |
|
Definition
| Orthogonal (uncorrelated) and oblique (correlated). |
|
|
Term
| Purpose of extraction (factor) |
|
Definition
| Percent variance explained by extracted factors. |
|
|
Term
| Purpose of factor analysis (with regards to extraction) |
|
Definition
| Extract as much variability as possible through as few factors as possible. |
|
|
Term
| Eigenvalues less than 1 indicate |
|
Definition
|
|
Term
| Interpret the scree plot by taking the "elbow" and |
|
Definition
|
|
Term
| What is the purpose of multiple regression? |
|
Definition
| To learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. |
|
|
Term
| According to true score theory, if observed variables = true score plus error, then what is the measurement error equal to? |
|
Definition
| Observed score minus true score (or irrelevant sources of variance). |
|
|
Term
| If error consists of unmeaningful components of variance, what are meaningful components of variance? |
|
Definition
|
|
Term
| Explain the effects of unreliability in causal models. |
|
Definition
| If a causal variable has measurement error, the estimate of its effect is biased, as well as the effects of other variables in the structural equation. Measurement error in the effect variable does not bias its coefficient unless the variables are standardized. In this case, the bias is that the true beta equals the measured beta divided by the square root of the endogenous variable's reliability. |
|
|
Term
| What are some advantages of SEM? |
|
Definition
| Can handle random and non-random measurement error, can reject models, enables advances treatment of missing data (full-information maximum likelihood), and can disentangle different variance and error sources. |
|
|
Term
|
Definition
| The observed covariance matrix is a function of a set of parameters. Relationships are being predicted, not scores. |
|
|
Term
| True or false: SEM procedures emphasize covariances and correlations rather than cases. |
|
Definition
|
|
Term
| In regression, the procedure minimizes differences between observed and predicted values for individual cases. How is this different from SEM? |
|
Definition
| In SEM, the procedure minimizes differences between observed variances/correlations and the ones predicted by the model. |
|
|
Term
| What are the data of SEM? |
|
Definition
| Covariance/correlation matrices. |
|
|
Term
| Assumptions about the variables on which the matrix coefficients are based: |
|
Definition
| (1) They are intervally scaled, and (2) They have a multivariate normal distribution. |
|
|
Term
| When using SEM, in which situations would you increase your sample size? |
|
Definition
| If you are using complex models, models with weak relationships, models with few observed variables per factor, and non-normal distributions. |
|
|
Term
| Compare the new and old methods for determining sample size in SEM. |
|
Definition
Old: 5-10 cases per parameter estimate (usually a minimum of 100-150 total N).
New: 10-20 cases per parameter estimate using RMSEA. |
|
|
Term
|
Definition
| Root Mean Squared Error of Approximation. A method used for showing error and misfit of the measure or data. Ideally, you want this value to be < .08. |
|
|
Term
| What was the start of SEM? |
|
Definition
|
|
Term
| Define measurement models and list the constituent parts. |
|
Definition
| Measurement models are the mapping of measures/items onto theoretical constructs. The constituent parts are loadings of the measures onto theoretical constructs, error variances, and error covariances (correlated errors). |
|
|
Term
|
Definition
| Effect of latent variable on the measure; if a measure loads onto only one factor, the standardized loading is the measure's correlation with the factor and can be interpreted as the square root of the measure's reliability. |
|
|
Term
|
Definition
| The variance in a measure not explained by the latent variance; does not imply that the variance is random or not meaningful, rather that it is unexplained by the latent variable. |
|
|
Term
| Define structural models and list the constituent parts. |
|
Definition
| Structural models are the causal and correlational links between theoretical (latent) variables. Constituent parts of this model are the paths, variances of the exogenous variables, variances of the disturbances of endogenous variables, covariances between disturbances, and covariances between disturbances and exogenous variables (usually set to zero). |
|
|
Term
| Define exogenous variables. |
|
Definition
| Variables not caused by another variable in the model. Usually, this variable causes one or more variables in the model. Think of this like an independent variable. |
|
|
Term
| Define endogenous variables. |
|
Definition
| Variables caused by one or more variables in the model. Note that an endogenous variable may also cause another endogenous variable in the model. Think of this like a dependent variable. |
|
|
Term
| Define standardized variable. |
|
Definition
| Variable whose mean is zero and variance is one. |
|
|
Term
|
Definition
| Variable in the model that is not measured. |
|
|
Term
| What are the components into which a correlation can be decomposed? |
|
Definition
| Direct effects, indirect effects, spurious effects (common causes), and unanalyzable components (correlated causes). |
|
|
Term
| What is the tracing rule? |
|
Definition
| The correlation between any pair of variables equals the sum of the products of the paths or correlations from each tracing. A tracing between two variables is any route in which the same variable is not entered twice and no variable is entered and left through an arrowhead (this applies only to hierarchical models/models with no feedback; tracings do not go through covariances). |
|
|
Term
| What are ways to scale latent variables? |
|
Definition
| Using correlations and Wright's tracing rules to solve for coefficients (assuming a variance of 1.0). |
|
|
Term
| If a variance/loading is not set to 1.0... |
|
Definition
| It will be impossible to solve the equation because there will be more unknowns than knowns. |
|
|
Term
| What are the steps in SEM? |
|
Definition
| (1) Specification, (2) Identification, (3) Estimation, and (4) Model fit. |
|
|
Term
| What is specification in SEM? |
|
Definition
| Statement of the theoretical model either as a set of equations or as a diagram. In this model, you cannot have more unknown variables than known variables. |
|
|
Term
| What is identification in SEM? |
|
Definition
| The model can, in theory and in practice, be estimated with observed data. |
|
|
Term
| What is estimation in SEM? |
|
Definition
| The model's parameters are statistically estimate from data. Multiple regression is one such esimation method, but typically more complicated estimation methods are used. |
|
|
Term
| What is model fit in SEM? |
|
Definition
| The estimated model parameters are used to predict the correlations/covariances betwen measured variables and the predicted correlations/covariances are compared to observed correlations/covariances. |
|
|
Term
| True or false: Recurrence models have feedback loops and are recursive. |
|
Definition
| False: recurrence models do not have feedback loops and are non-recursive. |
|
|
Term
| What is the chi-square test? |
|
Definition
| A test of model fit used in SEM. |
|
|
Term
| What are some characteristics of chi-square tests? |
|
Definition
| Sensitive to sample size (good for N of 75-100, otherwise larger sample sizes will always result in significant values) and size of the correlations (larger the correlation, poorer the fit). |
|
|
Term
| Why do you want chi-square to be nonsignificant? |
|
Definition
| Nonsignificance in chi-square tests shows that there are no significant differences between your data and your model (which is good and shows good fit). |
|
|
Term
| As sample size increases, power |
|
Definition
|
|
Term
| Joreskog measurement of model fit |
|
Definition
| Calculate the proportion of variance estimated by the estimated population covariance. This value ranges from 0 to 1, with a cutoff of .9. .95 is said to be good fit. |
|
|
Term
| Adjusted goodness of fit (GOF) index |
|
Definition
| A GOF method that penalizes lack of prediction and rewards parsimony. Ranges from 0 to 1, with a cutoff of .9 and good fit being .95. |
|
|
Term
| Incremental fit/Comparative indices |
|
Definition
| Compare your model with a fully saturated model and an independence model. Take the chi-square of your model and the comparative fit index. |
|
|
Term
| What is the comparative fix index? |
|
Definition
| A revised form of normative fix index but takes into account sample size. |
|
|
Term
| What are some characteristics of RMSEA? |
|
Definition
| You want little difference/residuals. < .08 is good, < .06 is optimal, however standard is < .07. |
|
|
Term
| Good model fit is not the same as |
|
Definition
|
|
Term
| What are the different types of criteria to consider when evaluating a model? |
|
Definition
| Theoretical, technical, statistical. |
|
|
Term
| What are theoretical criteria of model evaluation? |
|
Definition
| Appropriateness of general causal structure, inclusion of the right variables, and reasonableness of results in light of previous knowledge. |
|
|
Term
| What are technical criteria of model evaluation? |
|
Definition
| Identification status, appropriate estimation method, and appropriateness of instrumental variables. |
|
|
Term
| What are statistical criteria of model evaluation? |
|
Definition
| Reasonable parameter values, substantial coefficients linking measured variables to factor, latent endogenous variables are well explained, and fit. |
|
|
Term
| What does an independence model of fit assume? |
|
Definition
| That the observed variables are relaxed and that nothing correlates. |
|
|
Term
| What does a saturated model of fit assume? |
|
Definition
| All observed variables are relaxed and that everything intercorrelates. |
|
|
Term
| What is the premise of parsimony of a model? |
|
Definition
| Simple models are better. |
|
|
Term
| What is the purpose of a nested model of fit? |
|
Definition
| To directly compare several models of fit. |
|
|
Term
| Define model identification |
|
Definition
| A unique solution for the model's parameters exists. This two step proces involves testing the adequacy of fit of individual models and (if these tests are alright), testing the structural model for fit after the knowns are equal in quantity to the unknowns. |
|
|
Term
| What is the minimum condition of identifiability? |
|
Definition
| The number of known values must equal or exceed the number of free parameters in the model. If this rule is not met, the model is not identified. |
|
|
Term
| What is a justidentified/saturated model? |
|
Definition
| An identified model in which the number of free parameters exactly equals the number of known values; a model with zero degrees of freedom. This model will not give meaningful values. |
|
|
Term
| What is an underidentified model? |
|
Definition
| A model for which it is not possible to estimate all of the model's parameters because there are more unknown values than known values. |
|
|
Term
| What is an overidentified model? |
|
Definition
| A model for which all the parameters are identified and for which there are more knowns than free parameters. It places constraints on the correlation/covariance matrix and more known values than unknown values exist in this model. |
|
|
Term
| What is empirical underidentification? |
|
Definition
| A model which is theoretically identified, but one or more of the parameter estimates has a denominator that equals a very small value, making estimates unstable. This cannot be solved by hand. |
|
|
Term
| What is an example of empirical underidentification? |
|
Definition
| A path analysis model with high multicollinearity. |
|
|
Term
|
Definition
| The ability of an overidentified model to reproduce the variables' correlation or covariance matrix. |
|
|
Term
|
Definition
For standard specification, the number of covariances where n is the number of variables:
n(n+1)/2.
For path analytic specification: n(n-1)/2.
|
|
|
Term
|
Definition
| Setting of a parameter equal to some function of other parameters (e.g., setting one parameter equal to another). |
|
|
Term
| How do you figure out the degress of freedom of a model? |
|
Definition
| The number of knowns minus the number of free parameters. |
|
|
Term
| What is the covariance between two variables equal to? |
|
Definition
| The correlation times the product of the variables' standard deviations. The covariance of a variable with itself is the variable's variance. |
|
|
Term
| What are the free parameters in a structural model when using standard specification paths? |
|
Definition
| Covariances between the exogenous variables, the disturbances, the exogenous variables and disturbances, the variances of the exogenous variables, and the disturbances of endogenous variables less the number of linear constraints. |
|
|
Term
| What are the free parameters in a structural model using path analytic specification? |
|
Definition
| Paths and correlations between exogenous variables, disturbances and the exogenous variables, and the disturbances less the number of linear constraints. |
|
|
Term
|
Definition
| Testing for measurement invariance across groups (multigroup modeling). |
|
|
Term
| What is the procedure of multigroup analysis? |
|
Definition
| Test for the measure invariance between the uncorrelated model for all groups combined, then for a model where domain parameters and constrained to be equal between the groups. If chi-square does not yield a significant difference between the original and constrained-equal models, it is concluded that the model measures invariance score groups. |
|
|
Term
| What are some parameters that can be constrained to define measurement invariance? |
|
Definition
| Invariance on number of factors, invariant factor loadings, invariant structural relations among the latent variables, and equality of error variances and covariances across groups. |
|
|
Term
| If lack of measurement invariance is found, this means that |
|
Definition
| The meaning of the latent construct is shifting across groups or over time. |
|
|
Term
| When does interpretational confounding occur when determining measurement invariance? |
|
Definition
| When there is substantial measurement variance because the factor loadings are used to induce the meaning of the latent variable. If the factor loadings differ substantially across groups or time, then the induced meanings of the factors will differ substantially even if the same factor label is retained. |
|
|
Term
| Why are one-sample models tested separately first when testing for multigroup variance? |
|
Definition
| Separate testing provides an overview of how consistent the model results are, but it does not constitute testing for significant differences in the model's parameters between groups. |
|
|
Term
| If there is consistency in the multigroup invariance analysis, multigroup testing will proceed. Explain this process. |
|
Definition
| (1) Calculate baseline chi-square by computing model fit for the pooled sample of all groups, (2) Add constraints that various model parameters must be equal across groups and the model is fitted, (3) Chi-square difference test is completed to determine if the difference is significant, and (4) Nonsignificant concludes that the constrained-equal model is the same as the unconstrained multigroup model, meaning that the model applies across groups and displays measurement invariance. |
|
|
Term
| The constrained model expects factor loadings to be equal for |
|
Definition
| Each class of the grouping variable. |
|
|
Term
| What is multiple group confirmatory factor analysis? |
|
Definition
| Using SEM and the measurement invariance test, the chi-square difference is calculated in order to assess whether a set of indicators reflects a latent variable equally well across groups in the sample. |
|
|
Term
| What is the partial measurement invariance test (Kline, 1998)? |
|
Definition
| If the model fails the measurement invariance test, some indicators may still be invariant, so we examine each indicator for group invariance. |
|
|
Term
| Because standard errors of factor loadings cannot be computed, there are _______ methods but no ______ method for comparing models across groups. |
|
Definition
|
|
Term
| What are some characteristics of item response theory (IRT)? |
|
Definition
| It is mathematically complex and traced back to the 1940s. It was originally used for testing cognitive ability and due to increased computing power and widespread technology, it has been much more developed in recent years. |
|
|
Term
| What are some implications for scale development when using CMT? |
|
Definition
| Emphasis on redundancy, factor analysis, identification of several items that all appear to measure the same thing and placing them on the same scale, and longer scales = more reliable measures. |
|
|
Term
| What are some characteristics of IRT? |
|
Definition
| Items arranged on a continuum measure one attribute, passing an item implies greater attribute possession, items differ in level of difficulty but still measure the same attribute, and attributes can vary in broadness and specificity. |
|
|
Term
| What are the three parameters of IRT? |
|
Definition
| Item difficulty, item discrimination, and false positives. |
|
|
Term
| What is item difficulty in IRT? |
|
Definition
| The more difficult an item, the more of the attribute required to pass it. Items should therefore differ in terms of difficulty across the attribute continuum. |
|
|
Term
| What is item discrimination in IRT? |
|
Definition
| Items should do a good job of discriminating correct from incorrect answers (e.g., unambiguous classification of "pass" and "fail"). |
|
|
Term
| What are false positives in IRT? |
|
Definition
| Response that suggests attribute exists/is present when it doesn't/isn't. Good items minimize false positives. |
|
|
Term
| What is the item characteristic curve? |
|
Definition
| Explains the value of the latent trait and probability of a positive/correct answer (theta). The scaling constant is 1.702. |
|
|
Term
| What are the types of common unidimensional IRT models? |
|
Definition
| Dichotomous and polytomous. |
|
|
Term
| What are some of the dichotomous models in common unidimensional IRT models? |
|
Definition
| 1, 2, or 3 parameter logistic models. |
|
|
Term
| What are the polytomous models of the common unidimensional IRT model? |
|
Definition
| Samejima's Graded Response Model and Master's Partial Credit Model. |
|
|
Term
| In a 1 parameter logistic model, the only parameter estimated is |
|
Definition
|
|
Term
| A 1 parameter logistic model specifies that all items are |
|
Definition
|
|
Term
| A 2 parameter logistic model consists of |
|
Definition
| Difficulty and discrimination. |
|
|
Term
| A 3 parameter logistic model consists of |
|
Definition
| Difficulty, discriminatino, and c (a pseudo-guessing parameter). |
|
|
Term
| How do IRT and CMT differ in terms of SEM as a function of trait level? |
|
Definition
CMT assumption: The standard error of measurement applies to all scores in a population.
IRT alternative: The standard error of measurement differs across scores (or response patterns) but generalizes across populations. |
|
|
Term
| How do IRT and CMT differ in terms of test length and reliability? |
|
Definition
CMT assumption: Longer tests are more reliable than shorter tests.
IRT alternative: Shorter tests can be more reliable than longer tests. |
|
|
Term
| How do IRT and CMT differ in terms of assumptions? |
|
Definition
CMT assumption: Comparing test scores across multiple forms depends on test parallelism or adequate equating.
IRT alternative: Comparing scores from multiple forms is optimal when test difficulty levels vary across persons.
CMT assumption: Meaningful scale scores are obtained by comparisons of position in a score distribution.
IRT alternative: Meaningful scale scores are obtained by comparisons of distances from various items. |
|
|
Term
| True or false: the initial extraction (unrotated) indicates that there should be exactly as many items as factors. |
|
Definition
|
|
Term
| Initial communality of extraction of factors is always equal to |
|
Definition
|
|
Term
| What is the extracted communality in factor extraction? |
|
Definition
| The percent of variance in the item explained by the factor. |
|
|
Term
| Factor extraction/analysis is driven by which procedure? |
|
Definition
| Internal consistency reliability. |
|
|
Term
| After looking at communalities, what is the approximate appropriate number of items per factor in factor analysis? |
|
Definition
| Approximately three items per factor. |
|
|
Term
| In the item characteristic curve (ICC), a = |
|
Definition
|
|
Term
| In the item characteristic curve (ICC), b = |
|
Definition
|
|
Term
| In the item characteristic curve (ICC), c = |
|
Definition
| Probability of (at random) selecting a keyed response (through the use of a pseudo guessing parameter). This is not used often outside of cognitive assessment. |
|
|
Term
| Steeper ICC slopes indicate |
|
Definition
|
|
Term
| Difficulty is related to ______ in CMT. |
|
Definition
| The proportion correct score. |
|
|
Term
| Difficulty in ICC is the inverse of |
|
Definition
| The probability of getting an item correct (e.g., as a score increases, difficulty increases). |
|
|
Term
| Why are steeper ICC slopes useful? |
|
Definition
| Better able to distinguish those above/below the level of theta. |
|
|
Term
| As the slope of the ICC goes up, the curve |
|
Definition
|
|
Term
| As the difficulty in ICC goes up, the slope |
|
Definition
| Shifts further to the right. |
|
|
Term
| SEM (standard error of measurement) = |
|
Definition
| SD√(1-r), with r being reliability. |
|
|
Term
| What is the standard error of measurement negatively correlated to? |
|
Definition
|
|