Shared Flashcard Set

Details

Title

Statistics Final Flashcards

Description

Final flashcards for STATS220

Total Cards

Subject

Mathematics

Level

Undergraduate 1

Created

03/18/2013

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Mathematics Flashcards

Cards Return to Set Details

Term

Observational Unit

Definition

basic unit/individual that we are describing in the study

Term

Variable

Definition

data we are recording for each observational unit

Term

Qualitative

Definition

categorical (not-numbers)

Term

Quantitative

Definition

numbers or measurements

Term

Observational Study

Definition

the investigator simply records what is/has happened

Term

Experiment

Definition

the investigator imposes a treatment on the observational units

Term

Sample

Definition

The observational units on which we have data (***if we gave someone a questionnaire but they didn’t return it, they don’t count!***)

Term

Sampling Frame

Definition

All observational units who had a chance of being selected in the sample

Term

Population

Definition

The group of observational units we are ultimately trying to describe.

The population will depend on the question being asked

Sometimes the sample/sampling frame/population can be the same group. That is called a census.

Term

Parameter

Definition

truth about the population

*** Will almost always be in % unless explicitly asks for a number***

We usually do not know the true number, but we can describe it in words (% of UW students who live on campus)

Term

Statistic

Definition

describes the sample

May not be given in % format, but should be converted to match format of the parameter

We will usually be able to calculate this from the data given

Term

Parameter VS Statistic

Definition

Describes the population
Fixed, will not change
True value may be unknown

Describes the Sample
Will vary when different samples are taken
Will be able to compute from information given

Term

Probability sample

Definition

includes SRS. Any type of design in which randomization is used to pick the observational units

Term

Convenience sampling

Definition

the investigator selects which observational units will be in the sample

**almost always biased**

Term

Voluntary sample

Definition

the observational units choose whether they want to be in the sample or not

**almost always biased**

Term

Variability VS Bias

Definition

Think of Variability as how spread out are my estimates

Think of bias as how far away are my estimates from the truth

Not one against the other, both can be high, both can be low, or can be opposite

Term

Sources of Variability (4)

Definition

Random Sampling Error (sampling variability). ***This is the only variability accounted for by the Margin of Error*** Any additional bias or variability caused by poor survey design will add extra variability

Shortcut method for Confidence Interval = p.hat +/- 1/sqrt(n); where n = sample size

Confidence statement- We are 95% confident that the true parameter lies between (confidence interval)
***95% of the time that I follow this same procedure and construct a confidence interval, it will cover the true
parameter***

When the sample size increases we can be more sure about our estimate, so we do not need as large of a margin of error

Term

Sources of Bias (2; 1 with 4 possible)

Definition

Undercoverage- when the sampling frame does not accurately reflect the population (ex. random digit dials won’t include people without phones)

Non-sampling errors-
o Response error- people don’t answer truthfully (ex- how many times have you cheated on a test?)

o Non-response- when people don’t respond because they can’t be contact or don’t cooperate

o Processing errors- typos when recording data

o Question wording- confusing questions or questions which can cause a certain response to be more likely (leading questions)

Term

Explanatory variable

Definition

a variable that may cause a change in the response variable; the cause, usually the X variable

Term

Response variable

Definition

measures the outcome of an experiment; the effect, usually the Y variable

Term

Treatment

Definition

specific condition that is applied in an experiment; often the explanatory variable or mix of explanatory variables

Term

Lurking variable

Definition

variable that may have effect on response variable that is not measured

Term

Confounding variable

Definition

When two variables have effects on the response variable that cannot be distinguished from each other

Term

Statistically significant

Definition

***The result we found would rarely occur simply by chance***

Term

Placebo effect

Definition

The benefit derived from the psychological effect of receiving a treatment

Term

Double Blind

Definition

Both the clinicians and subjects are “blind” to whether they are in the control or treatment group

Term

Random Assignment

Definition

using impersonal chance to assign subjects to either the treatment or control group

Term

Histograms: Shape

Definition

Is the distribution skewed or symmetric? Is there one mode or multiple modes?

Term

Histograms: Range/Body

Definition

where do most of the observations lie? What is the highest/lowest values?

Term

Histograms: Center

Definition

What is the center point of the distribution? (mean, median or mode)

Term

Numerical Descriptions: Mean

Definition

Add up all observations and then divide the total by the number of observations. Highly affected by outliers, changes when you add to/multiply to the data

Term

Numerical Descriptions: Median

Definition

Midpoint of the distribution. Sort all your observations, and choose the middle observation, or average the middle two if there are an even number of observations. Not affected by outliers as much, changes when you add to/multiply to data

Term

Numerical Descriptions: Mode

Definition

Most common obs.

Term

Numerical Descriptions: Percentiles

Definition

The cth percentile of a distribution is defined so that (at least) c% of the observations are at or below it and (at
least) (100-c)% of the observations are at or above it

Term

Numerical Descriptions: Five Number summary

Definition

Min, 25%, Median, 75%, Max

Term

Numerical Descriptions: Standard Deviation

Definition

a measure of how spread out the data are. 68% of all observations lie within +/- 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd, changes when add to/multiply to data

o First find xbar (mean)
o Then add up (x – xbar)squared for each observation
o Divide that total by n-1
o Take the square root of that ratio

Term

Measures of Center

Definition

Mean, median and mode

Term

Quartiles

Definition

• At least 25% of observations are ≤ 1st Quartile, and at least 75% of observations are ≥ 1st quartile

• At least 75% of observations are ≤ 3rd Quartile, and at least 25% of observations are ≥ 3rd quartile

• Interquartile range = 3rd quartile – 1st quartile

• Changes when you add to or multiply the data

Term

Scatterplots

Definition

plots two variables on same graph. Each point is one individual observation

Term

Correlation

Definition

measures “strength of relationship between two variables

• Always between -1 and 1
• Positive correlation means positive association (as one increases, so does the other). Negative value means negative
association (as one increases, the other decreases)
• ***Correlation does not imply causation!!***
• Must be linear (or football shaped) to be valid measurement of association. No outliers

Term

Ecological Correlation

Definition

correlations based on averages or rates. Usually overstates the correlation

Term

Slope

Definition

rSy/Sx

Term

To find intercept

Definition

y.bar = a + b(x.bar)

Term

Regression SD

Definition

Regression sd is the “average size of error” (√1 − r.squared)(Sy) ***only use this when you are making a prediction involving prior information*** (think about on the quiz. When we picked a random student and guessed their quiz 2 score, we used the quiz two average and sd. But when we knew their quiz one score, then we used the regression sd)

Term

Prediction error

Definition

actual – predicted

Term

Regression Effect

Definition

observations that are extreme in the X-direction are not as extreme in the Y-direction

Term

Rules about probability

Definition

P(A) must be between 0 and 1
Total probability must add up to 1
P(A not happening) = 1- P(A)

Term

Normal Curves

Definition

Symmetric, bell shaped

Only need to know mean and standard deviation to define the whole curve

68% of all observations lie within +/- 1 sd of the mean, 95% within 2 sd, 99.7% within 3 sd

The standard score is the number of standard deviations an observation is away from the mean
std. score = (obs - mean)/ SD

Once we have the standard score, we can look up P(X < standard score) in Table B of the book

Term

Central Limit Theorem

Definition

As we take larger and larger samples, the sum or average (not product or ratio) will begin to look like an normal curve

Term

Central Limit Theorem pt. 2

Definition

If we take a sample proportion many times, the distribution will be a normal distribution with mean = p, and standard deviation √ [ p (1-p) / n ]

Term

Central Limit Theorem pt. 3

Definition

We would expect 95% of all p-hats to be within 2 sd of the mean, or p ± 2√ [ p (1-p) / n ]

Term

Central Limit Theorem pt. 4

Definition

When we don’t know mean, but have an estimate for p,

Term

Test of Significance

Definition

The basic idea, is that we will reject the null hypothesis if our observation is very unlikely to happen if the null hypothesis is true

Null Hypothesis- the status quo, or the no change option
Alternative Hypothesis- usually what we are trying to prove

Term

Calculating test of significance

Definition

Assume the null hypothesis is true, and calculate how likely our sample
1. Determine the mean and standard deviation of our “Null distribution” (the distribution when the null is true)
--Find mean = p and sd = √ [ p (1-p) / n ]
--Find the standard score of p.hat = (p.hat - p) /√ [ p (1-p) / n ]
Look up the value in the table. ***you may need to subtract the value from 1 depending on whether you want the area to the left or to the right of the standard score***

This is the p-value- the probability that something as extreme or more extreme than our current observation would occur when the null is true

If the p-value is less than .05, reject the null hypothesis

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Mathematics Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile