Term
| Goal of domain estimation |
|
Definition
| To establish and compare subpopulation (i.e. domain) population parameters. |
|
|
Term
| When to use domain estimation |
|
Definition
| When we want to estimate and compare subpopulation population parameters but have a SRSWOR design that does not make the sample specifically designed to estimate parameters for the domain. |
|
|
Term
| T/F Population sizes must be known for domain estimation |
|
Definition
| False The population sizes may be unknown. |
|
|
Term
| Is sample size of a domain in domain estimation fixed or random? |
|
Definition
| random unlike stratification where allocation is fixed Don't know nd until after the datat have been collected. The value of nd changes from sample to sample. |
|
|
Term
| T/F The value of nd is the same from sample to sample in domain estimation. |
|
Definition
| False The value of nd changes from sample to sample in domain estimation |
|
|
Term
| Is the total sample size in domain estimation fixed or random? |
|
Definition
|
|
Term
| Ud = what in domain estimation? |
|
Definition
| index set for population domain |
|
|
Term
| Ad = what in domain estimation? |
|
Definition
| index set for sample domain |
|
|
Term
| look at population parameter formulas for domain estimation on slide nine of week 11 |
|
Definition
|
|
Term
| T/F Domain estimation is a good estimation when Nd is known. |
|
Definition
False
If Nd were known, then we would want to use SYS instead. |
|
|
Term
| Variables u and x in domain estimation |
|
Definition
Numerator variable u = data value for domain, and 0 if otherwise (slide 12 week 11) yi if i in Ud, 0 if otherwise
Denominator variable x = indicator of domain membership. 1 if i in Ud, 0 if otherwise |
|
|
Term
| *Study all formulas on formula sheet! Know what is what!* |
|
Definition
|
|
Term
| What are the null and alternative hypotheses when testing whether two domain population means are equal? |
|
Definition
H0: ybarU1 = ybarU2
H1: ybarU1 = ybarU2
Equivalently:
H0: ybarU1 - ybarU2 = 0
H1: ybarU1 - ybarU2 = 0 |
|
|
Term
| Formula for z-test statistic: (need to know???) |
|
Definition
z = (ybar1 - ybar2)/sqrt(V(ybar1)+V(ybar2))
Reject H0 if abs(z) > zalpha/2 |
|
|
Term
| What is the formula for calculating a confidence interval? |
|
Definition
|
|
Term
| Impact of nonresponse (2) |
|
Definition
Potential bias
Loss of precision |
|
|
Term
| Strategies to reduce nonresponse (NR) |
|
Definition
Design phase.
After data collection: call-backs, post-stratification, impute |
|
|
Term
| When using post data collection strategies (3) to reduce nonresponse, what types of nonresponse are each of the three usually used to fix? |
|
Definition
Call-backs fix both unit and item non-response.
Post-stratification - unit non-response primarily.
Imputation - item non-response primarily. |
|
|
Term
|
Definition
|
|
Term
| Formula for response rate |
|
Definition
nR/n
where nR = realized sample size
durb |
|
|
Term
| Nonresponse framework and population parameters |
|
Definition
|
|
Term
| Nonresponse sample framework. Graphic of N, M, R, NH, NR, nM, nR, n |
|
Definition
|
|
Term
|
Definition
| Occurs when differences exist between the population mean of y for the nonresponding subpopulation ybarMU and the population mean of y for the responding subpopulation ybarRU |
|
|
Term
| What does the magnitude of nonresponse bias depend on? |
|
Definition
Differences between population means
Nonresponse rate |
|
|
Term
| How does nonresponse reduce precision and how can you remedy this? |
|
Definition
Sample size reductions due to NR affect precision by increasing variances.
Remedy by anticipating and designing for NR sample size attrition
Method: divide the target sample size desired (n) by the guessed proportion of respondents (NR/N: R). Formula: n/R |
|
|
Term
| What is the best strategy for addressing nonresponse bias? |
|
Definition
| Design survey to prevent NR |
|
|
Term
| Using data from call-backs of NR cases to adjust for bias - steps in process. |
|
Definition
Select a sample from the nonrespondents to the survey.
Collect data from contacted nonrespondents.
Use these data to estimate population mean for nonrespondents ybarMU.
Estimate populaiton mean for whole population ybarU with a weighted combination of respondent sample mean and nonrespondent sample mean. |
|
|
Term
| Is the estimator of population mean using the callback method to deal with nonresponse biased or unbiased? |
|
Definition
|
|
Term
| Post-stratification as a rememdy for nonresponse - steps |
|
Definition
Divide population into H mutually exclusive and exhaustive post-strata.
For each post-stratum: Know post-stratum sizes Nh, estimate characteristics of post-strata, use post-stratum sample mean to estimate post-stratum population mean, pool post-stratum estimates using, for example, a weighted mean of the post-stratum estimates.
|
|
|
Term
| What is the formula for the sampling weight in post-stratification design? |
|
Definition
|
|
Term
| Assumptions in post-stratification adjustment. |
|
Definition
| Distribution of y is approximately equal for responding portion of post-stratum population and nonresponding portion of post-stratum population. |
|
|
Term
| What does imputation as a strategy for dealing with nonresponse do? |
|
Definition
A statistical method for "filling in" or "predicting" missing values.
Impute values so that they represent the distribution of the response variable with missing data (y).
Impute values using a method that supports estimation of the variance associated with the random components of the imputation process. |
|
|
Term
|
Definition
Deductive imputation
Call mean imputation
hot-deck imputation (random)
regression imputation
multiple imputation |
|
|
Term
|
Definition
common method, rarely implementable
Use a deterministic rule to assign a value (e.g. crime victim: no = violent crime victim: no)
There must be sufficien nformation to identify the missing value with a high degree of certainty.
Relatively uncommon, especially with use of computer-assisted survey instruments when checks for these realtionships are embedded inteh computer-based questionnaire. |
|
|
Term
|
Definition
Avoid: leads to incorrect distribution of y in dataset.
Divide responding units in to imputation classes.
With a given imputation class: calculate the average value for available item data in class, fill in missing value for nonresponding unit with average value.
Retains mean estimate for an imputation class. Underestimates variance within an imputation class, which misrepresents distribution of y.
|
|
|
Term
|
Definition
Most common and generally applicable.
May apply within groups of respondents (auxilliary info).
Divide responding units in to imputation classes. Within a given imputation class: randomly select a donor from responding units in class, filling in missing value for nonresponding unit with value from donor unit.
Retains variation in individual values, can impute from many variables from same donor, variations exist |
|
|
Term
|
Definition
Uses model to incorporate auxiliary information, between hot-deck and cell mean imputation methods.
Use a regression model to relate covariate(s) to variable with missing data.
Estimate regression parameters with data from responding units, fill in missing value with predicted value.
Useful if a strong relationship exists that provides a better predicted value for the missing data, form of (conditional) mean imputation, requires separate model for each variable with missing data. |
|
|
Term
|
Definition
Accounting for variation due to imputation process.
Decide on an imputation model, impute m>1 values for each missing data item, result is m (different) data sets with no missing values.
Variation in estimates across data sets provides an estimate of the variability associated with the imputation process, analysis is more complex. |
|
|
Term
| Cluster sample definition |
|
Definition
A cluster sample is a probability sample in which a sampling unit is a cluster.
We will no longer assume SU = element |
|
|
Term
| Steps in 1-stage cluster sampling |
|
Definition
Divide the population (of K elements) into N total clusters.
Take a sample of n clusters.
|
|
|
Term
| Comparing 1-stage CS and STS |
|
Definition
1-stage CS: A block of cells is a cluster, SU is a cluster, don't sample from every cluster.
STS: A block of cells is a stratum, SU is an element, sample from every stratum. |
|
|
Term
| Why use cluster sampling? |
|
Definition
May not have a list of elements for a frame, but a list of clusters may be available.
May be cheaper to conduct the study if elements are clustered. |
|
|
Term
| Reasons that cluster sampling usually leads to less precise estimates. |
|
Definition
Elements within clusters tend to be correlated due to exposure to similar conditions.
We get less information than if we observe the same number of unrelated elements. |
|
|
Term
| Ways to define clusters for improved precision. |
|
Definition
Define clusters for which within-cluster variation is high (rarely possible).
Define clusters that are relatively small. |
|
|
Term
| Notation for cluster sampling (i, j, N, n, Mi, K) |
|
Definition
i = index for cluster i
i, j = index for element j in cluster i
N = total clusters in population
n = sampled clusters
Mi = elements in a cluster
K = number of elements in population (sum Mi) |
|
|
Term
| Weight in CSE1 and is it self-weighting? |
|
Definition
N/n
Yes, it is self-weighting. |
|
|
Term
| What is the weight formula for CSE2 and is it self weighting? |
|
Definition
(N/n)*(Mi/mi)
It is not always self-weighting. |
|
|
Term
| Cluster popultion mean and within-cluster variance formulas (not on sheet). |
|
Definition
ybariU = tiU/Mi
Si2 = 1/(Mi - 1)*sum[(yij - ybariU)2] |
|
|
Term
| What is the weight formula for CSU1 and is it self-weighting? |
|
Definition
Qi/nψi = QiK/nMi
Not always self weighting (??) |
|
|
Term
| What is the weight formula for CSU2 and is it self-weighting? |
|
Definition
k/nmi
Not always self weighting depending on mi (??) |
|
|
Term
| An element data set (cluster design) will have columns for at least what variables? |
|
Definition
Cluster id (i)
Element id within cluster (j)
Variable (yij) |
|
|
Term
| A cluster data set will have collumns for at least what variables? |
|
Definition
Cluster id (i)
Cluster total under 1-stage CS (tiU)
Cluster mean under 1-stage CS (ybariU)
Within-cluster variance under 1-stage CS (si2) |
|
|
Term
| Biased (ratio) estimation for CSE1 |
|
Definition
Usually ti (cluster total) is positively correlated with Mi (cluster size)
No intercept
Notation of chapter 3 versus notation of chapter 5 ratio: yi (variable of interest) = ti (cluster total), xi (auxiliary info) = Mi (cluster size) |
|
|
Term
| What is MbarU in cluster sampling? |
|
Definition
The average cluster size for population
If unknown, can estimate with sample mean of cluster sizes Mbars = 1/n*sum(Mi) |
|
|
Term
| 2-stage cluster sampling with equal selection probabilities (CSE2) overview |
|
Definition
Stage 1: Select clusters. SRSWOR of n PSUs from population of N PSUs.
Stage 2: Select elements within each sampled cluster. SRSWOR of mi SSUs from Mi elements in PSU i sampled in stage 1.
First stage sampling unit is a primary sampling unit (PSU) = cluster.
Second stage sampling unit is a secondary sampling unit (SSU) = element
Only collect data on the SSUs that were sampled from the cluster. |
|
|
Term
| Motivation for 2-stage cluster samples (instead of just 1-stage) |
|
Definition
Likely that elements in cluster will be correlated.
-May be inefficient to observe all elements in a sample PSU and the extra effort required to fully enumerate a PSU does not provide that much extra information.
May be better to spend resources to sample many PSUs and a small number of SSUs per PSU. (Possible opposing force: study costs associated to going to many clusters) |
|
|
Term
| The variance of thatunb has 2 components associated with the 2 sampling stages, what are these components? |
|
Definition
1. Variation among PSUs
2. Variation amonog SSUs within PSUs
[image] |
|
|
Term
| T/F Equal probability at stage 1 plus equal probablity in stage 2 given PSU i in 2-stage cluster sampling implies equal inclusion probablity for an element. |
|
Definition
False
It does NOT imply equal inclusion probability for an element (unconditional probability for element)
slide 65 of week 13 |
|
|
Term
| When to use unbiased estimation versus ration estimation in CSE2? |
|
Definition
Unbiased estimation - Use if you know K or N
e.g. N= total number of clutches or K = total number of eggs in Minnedosa, Manitoba
Ratio estimation - Only requires knowledge of Mi (e.g. number of eggs in clutch i), in addition to data collected |
|
|
Term
| When will an unbiased estimator have poor precision in CSE2? |
|
Definition
When cluster sizes (Mi) are unequal
ti (cluster total) is roughly proportional to Mi (cluster size) |
|
|
Term
| When will ration estimation (biased) be precise in CSE2? |
|
Definition
When ti is roughly proportional to Mi (bigger cluster = larger ti)
This happens frequently in pops where cluster sizes (Mi) vary |
|
|
Term
| Inclusion probabilities for an element under 2-stage cluster sampling using SRSWOR at each stage (CSE2) |
|
Definition
πi = P{cluster i in sample} = n/N
πj|i = Pr{element j GIVEN cluster i in sample} = mi/Mi
πij = Pr{element j AND cluster i in sample} = πiπj|i = (n/N)x(mi/Mi) = nmi/NMi |
|
|
Term
| CSE2 Self-weighting design |
|
Definition
Stage 1: Select n PSUs from N PSUs in pop using SRS
Stage 2: Choose mi proportional to Mi so that mi/Mi is constant, use SRS to select sample - if this is achieved, then it is self-weighting
Sample weight for SSU j in cluster i is constant for all elements. Weight may vary slightly in practice, however, because it may not be possible for mi/Mi to be equal to 1/c for all clusters. |
|
|
Term
| Why are self-weighting samples appealing? What is the caveat for variance estimation in self-weighting samples? |
|
Definition
They are appealing because you can use the simple mean estimator.
The caveat for variance estimation is that there is not break on variance of estimator - must use proper variance estimation formula for sample design. |
|
|
Term
|
Definition
SRS
SYS
STS with proportional allocation
CSE1
CSE2 with mi proportional to Mi or c = Mi/mi |
|
|
Term
| Why is there no ratio estimator for CSU designs? |
|
Definition
| There is no ratio estimator because Mi has already been incorporated in the first stage |
|
|
Term
| Why use unequal probability cluster samples? |
|
Definition
| Use unequal selection probabilities to sample clusters to save costs and improve precision for a given budget. |
|
|
Term
| How do you select clusters and elements in CSU2 design? |
|
Definition
Select cluster with PPSWR (stage 1)
Select elements with SRSWOR (stage 2) |
|
|
Term
| What is the size or importance measure xi in CSU design? |
|
Definition
| Size or importance measure xi is Mi = number of elements or SSUs in PSU i |
|
|
Term
| Selection probability for PSU i in CSU1 |
|
Definition
|
|
Term
| Is thatψ a biased or unbiased estimator of t? |
|
Definition
Unbiased
Variance estimator is also unbiased.
Also holds for population mean. |
|
|
Term
| T/F In CSU2 the variance estimator captures both between and within cluster variance. |
|
Definition
True
Because we use WR sampling, variance estimator captures both between and within cluster variance.
This holds for the estimator for the population mean. |
|
|
Term
|
Definition
| Yes, if mi is constant across clusters |
|
|