Shared Flashcard Set

Details

Title

Stat 421

Description

Final Exam

Total Cards

Subject

Mathematics

Level

Graduate

Created

04/20/2012

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Mathematics Flashcards

Cards Return to Set Details

Term

Goal of domain estimation

Definition

To establish and compare subpopulation (i.e. domain) population parameters.

Term

When to use domain estimation

Definition

When we want to estimate and compare subpopulation population parameters but have a SRSWOR design that does not make the sample specifically designed to estimate parameters for the domain.

Term

T/F Population sizes must be known for domain estimation

Definition

False The population sizes may be unknown.

Term

Is sample size of a domain in domain estimation fixed or random?

Definition

random unlike stratification where allocation is fixed Don't know nd until after the datat have been collected. The value of nd changes from sample to sample.

Term

T/F The value of nd is the same from sample to sample in domain estimation.

Definition

False The value of nd changes from sample to sample in domain estimation

Term

Is the total sample size in domain estimation fixed or random?

Definition

fixed

Term

U_d = what in domain estimation?

Definition

index set for population domain

Term

A_d = what in domain estimation?

Definition

index set for sample domain

Term

look at population parameter formulas for domain estimation on slide nine of week 11

Definition

Term

T/F Domain estimation is a good estimation when N_d is known.

Definition

False

If N_d were known, then we would want to use SYS instead.

Term

Variables u and x in domain estimation

Definition

Numerator variable u = data value for domain, and 0 if otherwise (slide 12 week 11) y_i if i in U_d, 0 if otherwise

Denominator variable x = indicator of domain membership. 1 if i in U_d, 0 if otherwise

Term

*Study all formulas on formula sheet! Know what is what!*

Definition

Term

What are the null and alternative hypotheses when testing whether two domain population means are equal?

Definition

H₀: ybar_U1= ybar_U2

H₁: ybar_U1 = ybar_U2

Equivalently:

H₀: ybar_U1 - ybar_U2 = 0

H₁: ybar_U1 - ybar_U2 = 0

Term

Formula for z-test statistic: (need to know???)

Definition

z = (ybar₁ - ybar₂)/sqrt(V(ybar₁)+V(ybar₂))

Reject H₀ if abs(z) > z_alpha/2

Term

What is the formula for calculating a confidence interval?

Definition

Θ ± z_alpha/2SE(Θ)

Term

Impact of nonresponse (2)

Definition

Potential bias

Loss of precision

Term

Strategies to reduce nonresponse (NR)

Definition

Design phase.

After data collection: call-backs, post-stratification, impute

Term

When using post data collection strategies (3) to reduce nonresponse, what types of nonresponse are each of the three usually used to fix?

Definition

Call-backs fix both unit and item non-response.

Post-stratification - unit non-response primarily.

Imputation - item non-response primarily.

Term

Two types of nonresponse

Definition

Unit

Item

Term

Formula for response rate

Definition

n_R/n

where n_R = realized sample size

durb

Term

Nonresponse framework and population parameters

Definition

[image]

Term

Nonresponse sample framework. Graphic of N, M, R, N_H, N_R, n_M, n_R, n

Definition

[image]

Term

Nonresponse bias

Definition

Occurs when differences exist between the population mean of y for the nonresponding subpopulation ybar_MU and the population mean of y for the responding subpopulation ybar_RU

Term

What does the magnitude of nonresponse bias depend on?

Definition

Differences between population means

Nonresponse rate

Term

How does nonresponse reduce precision and how can you remedy this?

Definition

Sample size reductions due to NR affect precision by increasing variances.

Remedy by anticipating and designing for NR sample size attrition

Method: divide the target sample size desired (n) by the guessed proportion of respondents (N_R/N: R). Formula: n/R

Term

What is the best strategy for addressing nonresponse bias?

Definition

Design survey to prevent NR

Term

Using data from call-backs of NR cases to adjust for bias - steps in process.

Definition

Select a sample from the nonrespondents to the survey.

Collect data from contacted nonrespondents.

Use these data to estimate population mean for nonrespondents ybar_MU.

Estimate populaiton mean for whole population ybar_U with a weighted combination of respondent sample mean and nonrespondent sample mean.

Term

Is the estimator of population mean using the callback method to deal with nonresponse biased or unbiased?

Definition

Unbiased

Term

Post-stratification as a rememdy for nonresponse - steps

Definition

Divide population into H mutually exclusive and exhaustive post-strata.

For each post-stratum: Know post-stratum sizes N_h, estimate characteristics of post-strata, use post-stratum sample mean to estimate post-stratum population mean, pool post-stratum estimates using, for example, a weighted mean of the post-stratum estimates.

Term

What is the formula for the sampling weight in post-stratification design?

Definition

w_hj = N_h/n_hr

Term

Assumptions in post-stratification adjustment.

Definition

Distribution of y is approximately equal for responding portion of post-stratum population and nonresponding portion of post-stratum population.

Term

What does imputation as a strategy for dealing with nonresponse do?

Definition

A statistical method for "filling in" or "predicting" missing values.

Impute values so that they represent the distribution of the response variable with missing data (y).

Impute values using a method that supports estimation of the variance associated with the random components of the imputation process.

Term

Imputation methods (5)

Definition

Deductive imputation

Call mean imputation

hot-deck imputation (random)

regression imputation

multiple imputation

Term

Deductive Imputation

Definition

common method, rarely implementable

Use a deterministic rule to assign a value (e.g. crime victim: no = violent crime victim: no)

There must be sufficien nformation to identify the missing value with a high degree of certainty.

Relatively uncommon, especially with use of computer-assisted survey instruments when checks for these realtionships are embedded inteh computer-based questionnaire.

Term

Cell mean imputation

Definition

Avoid: leads to incorrect distribution of y in dataset.

Divide responding units in to imputation classes.

With a given imputation class: calculate the average value for available item data in class, fill in missing value for nonresponding unit with average value.

Retains mean estimate for an imputation class. Underestimates variance within an imputation class, which misrepresents distribution of y.

Term

Hot-deck imputation

Definition

Most common and generally applicable.

May apply within groups of respondents (auxilliary info).

Divide responding units in to imputation classes. Within a given imputation class: randomly select a donor from responding units in class, filling in missing value for nonresponding unit with value from donor unit.

Retains variation in individual values, can impute from many variables from same donor, variations exist

Term

Regression imputation

Definition

Uses model to incorporate auxiliary information, between hot-deck and cell mean imputation methods.

Use a regression model to relate covariate(s) to variable with missing data.

Estimate regression parameters with data from responding units, fill in missing value with predicted value.

Useful if a strong relationship exists that provides a better predicted value for the missing data, form of (conditional) mean imputation, requires separate model for each variable with missing data.

Term

Multiple Imputation

Definition

Accounting for variation due to imputation process.

Decide on an imputation model, impute m>1 values for each missing data item, result is m (different) data sets with no missing values.

Variation in estimates across data sets provides an estimate of the variability associated with the imputation process, analysis is more complex.

Term

Cluster sample definition

Definition

A cluster sample is a probability sample in which a sampling unit is a cluster.

We will no longer assume SU = element

Term

Steps in 1-stage cluster sampling

Definition

Divide the population (of K elements) into N total clusters.

Take a sample of n clusters.

Term

Comparing 1-stage CS and STS

Definition

1-stage CS: A block of cells is a cluster, SU is a cluster, don't sample from every cluster.

STS: A block of cells is a stratum, SU is an element, sample from every stratum.

Term

Why use cluster sampling?

Definition

May not have a list of elements for a frame, but a list of clusters may be available.

May be cheaper to conduct the study if elements are clustered.

Term

Reasons that cluster sampling usually leads to less precise estimates.

Definition

Elements within clusters tend to be correlated due to exposure to similar conditions.

We get less information than if we observe the same number of unrelated elements.

Term

Ways to define clusters for improved precision.

Definition

Define clusters for which within-cluster variation is high (rarely possible).

Define clusters that are relatively small.

Term

Notation for cluster sampling (i, j, N, n, M_i, K)

Definition

i = index for cluster i

i, j = index for element j in cluster i

N = total clusters in population

n = sampled clusters

M_i = elements in a cluster

K = number of elements in population (sum M_i)

Term

Weight in CSE1 and is it self-weighting?

Definition

N/n

Yes, it is self-weighting.

Term

What is the weight formula for CSE2 and is it self weighting?

Definition

(N/n)*(M_i/m_i)

It is not always self-weighting.

Term

Cluster popultion mean and within-cluster variance formulas (not on sheet).

Definition

ybar_iU = t_iU/M_i

S_i² = 1/(M_i - 1)*sum[(y_ij - ybar_iU)²]

Term

What is the weight formula for CSU1 and is it self-weighting?

Definition

Q_i/nψ_i = Q_iK/nM_i

Not always self weighting (??)

Term

What is the weight formula for CSU2 and is it self-weighting?

Definition

k/nm_i

Not always self weighting depending on m_i (??)

Term

An element data set (cluster design) will have columns for at least what variables?

Definition

Cluster id (i)

Element id within cluster (j)

Variable (y_ij)

Term

A cluster data set will have collumns for at least what variables?

Definition

Cluster id (i)

Cluster total under 1-stage CS (t_iU)

Cluster mean under 1-stage CS (ybar_iU)

Within-cluster variance under 1-stage CS (s_i²)

Term

Biased (ratio) estimation for CSE1

Definition

Usually t_i (cluster total) is positively correlated with M_i (cluster size)

No intercept

Notation of chapter 3 versus notation of chapter 5 ratio: y_i (variable of interest) = t_i(cluster total), x_i (auxiliary info) = M_i (cluster size)

Term

What is Mbar_U in cluster sampling?

Definition

The average cluster size for population

If unknown, can estimate with sample mean of cluster sizes Mbar_s = 1/n*sum(M_i)

Term

2-stage cluster sampling with equal selection probabilities (CSE2) overview

Definition

Stage 1: Select clusters. SRSWOR of n PSUs from population of N PSUs.

Stage 2: Select elements within each sampled cluster. SRSWOR of m_i SSUs from M_i elements in PSU i sampled in stage 1.

First stage sampling unit is a primary sampling unit (PSU) = cluster.

Second stage sampling unit is a secondary sampling unit (SSU) = element

Only collect data on the SSUs that were sampled from the cluster.

Term

Motivation for 2-stage cluster samples (instead of just 1-stage)

Definition

Likely that elements in cluster will be correlated.

-May be inefficient to observe all elements in a sample PSU and the extra effort required to fully enumerate a PSU does not provide that much extra information.

May be better to spend resources to sample many PSUs and a small number of SSUs per PSU. (Possible opposing force: study costs associated to going to many clusters)

Term

The variance of that_unb has 2 components associated with the 2 sampling stages, what are these components?

Definition

1. Variation among PSUs

2. Variation amonog SSUs within PSUs

[image]

Term

T/F Equal probability at stage 1 plus equal probablity in stage 2 given PSU i in 2-stage cluster sampling implies equal inclusion probablity for an element.

Definition

False

It does NOT imply equal inclusion probability for an element (unconditional probability for element)

slide 65 of week 13

Term

When to use unbiased estimation versus ration estimation in CSE2?

Definition

Unbiased estimation - Use if you know K or N

e.g. N= total number of clutches or K = total number of eggs in Minnedosa, Manitoba

Ratio estimation - Only requires knowledge of M_i (e.g. number of eggs in clutch i), in addition to data collected

Term

When will an unbiased estimator have poor precision in CSE2?

Definition

When cluster sizes (M_i) are unequal

t_i (cluster total) is roughly proportional to M_i(cluster size)

Term

When will ration estimation (biased) be precise in CSE2?

Definition

When t_i is roughly proportional to M_i (bigger cluster = larger t_i)

This happens frequently in pops where cluster sizes (M_i) vary

Term

Inclusion probabilities for an element under 2-stage cluster sampling using SRSWOR at each stage (CSE2)

Definition

π_i = P{cluster i in sample} = n/N

π_j_|i= Pr{element j GIVEN cluster i in sample} = m_i/M_i

πij = Pr{element j AND cluster i in sample} = π_iπ_j_|i = (n/N)x(m_i/M_i) = nm_i/NM_i

Term

CSE2 Self-weighting design

Definition

Stage 1: Select n PSUs from N PSUs in pop using SRS

Stage 2: Choose m_i proportional to M_i so that m_i/M_i is constant, use SRS to select sample - if this is achieved, then it is self-weighting

Sample weight for SSU j in cluster i is constant for all elements. Weight may vary slightly in practice, however, because it may not be possible for m_i/M_i to be equal to 1/c for all clusters.

Term

Why are self-weighting samples appealing? What is the caveat for variance estimation in self-weighting samples?

Definition

They are appealing because you can use the simple mean estimator.

The caveat for variance estimation is that there is not break on variance of estimator - must use proper variance estimation formula for sample design.

Term

Self-weighting designs

Definition

SRS

SYS

STS with proportional allocation

CSE1

CSE2 with m_i proportional to M_i or c = M_i/m_i

Term

Why is there no ratio estimator for CSU designs?

Definition

There is no ratio estimator because M_i has already been incorporated in the first stage

Term

Why use unequal probability cluster samples?

Definition

Use unequal selection probabilities to sample clusters to save costs and improve precision for a given budget.

Term

How do you select clusters and elements in CSU2 design?

Definition

Select cluster with PPSWR (stage 1)

Select elements with SRSWOR (stage 2)

Term

What is the size or importance measure x_i in CSU design?

Definition

Size or importance measure x_i is M_i = number of elements or SSUs in PSU i

Term

Selection probability for PSU i in CSU1

Definition

ψ_i = M_i/K

Term

Is that_ψ a biased or unbiased estimator of t?

Definition

Unbiased

Variance estimator is also unbiased.

Also holds for population mean.

Term

T/F In CSU2 the variance estimator captures both between and within cluster variance.

Definition

True

Because we use WR sampling, variance estimator captures both between and within cluster variance.

This holds for the estimator for the population mean.

Term

Is CSU self-weighting?

Definition

Yes, if m_i is constant across clusters

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Mathematics Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile