Shared Flashcard Set

Details

Title

POPM 6520 Stats

Description

Statistics half of the course taught by Olaf Berke

Total Cards

Subject

Other

Level

Graduate

Created

12/11/2015

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Other Flashcards

Cards Return to Set Details

Term

Probability

Definition

relative frequency of events

Term

binomial distribution

Definition

a simple model that assumes only two outcomes are possible.
models probability to observe k events among a sample of n individuals.

Term

Bernoulli distribution

Definition

when n=1 and we are interested in the probability of observing a case with a single draw from a binomaially distributed population

Term

name three ways to tell if something is Gaussian distributed

Definition

-investigate histogram
-qq plot
-apply significance test (Shapiro-Wilks test)

Term

QQ plot

Definition

compares quantiles of an observed frequency distribution to quantiles of an expected distribution.
used for testing Gaussian distribution

Term

Shapiro Wilks test

Definition

the null assumes the sample is Gaussian and if the test is not significant we accept the alternative that it is not

Term

de Moivre-Laplace Theorem

Definition

when the success probability/prevalence of binomial distribution converges to 0.5 or binomial population is increasing, the binomial distribution is becoming more symmetric

Term

variance

Definition

deviation of each observation from the mean

Term

standard deviation

Definition

average of the deviations of the observations from the mean

Term

coefficient of variation

Definition

standard deviation expressed as percentage of the mean

Term

the simplest method in R to estimate the mean and its confidence interval

Definition

t.test

Term

addition rule (probability)

Definition

when two events are mutually exclusive (cannot occur at the same time), the probability of either occurring is the sum of the probability of each event

Term

multiplication rule (probability)

Definition

two events are independent (occurrence of one does not affect the other) then the probability of both events occurring is the product of individual probabilities

Term

conditional probability

Definition

when two events are not independent; the probability of A occurring when we know B has occurred

Term

when is the binomial distribution used (list 2)

Definition

-when investigating a binary response (only two possible outcomes)
-for analyzing proportions and making inferences about them

Term

what do we do when data is skewed right in Gaussian ditribution

Definition

take the lognormal distribution

Term

properties of Gaussian distribution (6)

Definition

-described by 2 parameters (mean, SD)
-unimodal
-symmetrical about the mean
-mean, median and mode all equal
-if SD doesn't change, but mean increases then curve shifts right (decrease and it shifts left)
-decrease SD makes curve thinner, increase SD makes it fatter

Term

properties of t distribution (3)

Definition

-symmetrical about the mean
-characterized by degrees of freedom
-when large degrees of freedom, looks like normal distribution

Term

properties of chi-squared distribution (2)

Definition

-can only take positive values, highly skewed
-characterized by degrees of freedom (approaches normal when large)

Term

properties of f-distribution (3)

Definition

-distribution of a ratio
-two separate degrees of freedom (numerator and denominator)
-tabulated probabilities relate to ratio>1

Term

two distributions used when we are dealing with discrete variables

Definition

binomial and poisson

Term

normal distribution is used for what sort of variable

Definition

continuous

Term

when do we use continuity correction in normal distribution

Definition

if we use tables of normal distribution to approximate the poisson or binomial distribution

Term

what is the sampling distribution of the mean and what does it depend on

Definition

the extent to which a sample mean differs from population mean
depends on size of the sample (larger means less error)
variability of the observations (error greater if sample more diverse)

Term

what are the properties of sampling distribution of the mean (3)

Definition

-normal distribution if parent distribution is normal (assume normality if sample size >30)
-mean of the sampling distribution of the mean is same as parent pop
-standard deviation known as standard error of the mean (smaller with larger sample sizes)

Term

what is the difference between standard error of the mean and standard deviation

Definition

-SD measures scatter of the observations where SEM measures precision of the sample mean as an estimate of the population mean

Term

what is a confidence interval (for the mean)

Definition

defined by upper and lower limits, is a range of values within which we expect the true population mean to lie with a certain probability

Term

what is a null hypothesis

Definition

the converse of the study hypothesis (usually try to disprove it)

Term

what is an alternate hypothesis

Definition

states there is a difference between parameter values but the direction is not known (therefore usually leads to a two tailed test)
if we know one txt can only be better and not worse we may use a one sided test

Term

what is a p-value

Definition

the chance of getting the observed effect if the null hypothesis is true

Term

describe type I error

Definition

if the two means are equal and we have rejected the null hypothesis when we should not have rejected it
-limit the probability of TI error to be less than alpha (significance level)

Term

describe type II error

Definition

if the two means differ and we have not rejected the null when we should have
-probability of TII error designated by beta
-1-beta is the power of a test

Term

what are the different types of t-test and give a brief description

Definition

-one sample t-test: comparing mean/expected value to a reference value
-two sample t-test: comparing means/expected values of two independent populations
-Welch's test: a version of two sample test when variances are unequal
-paired t-test: when data is not independent (paired), so it is reduced to a one sample t-test

Term

what are the two main assumptions for using a t-test

Definition

-mean of the sample data is Gaussian distributed
-unknown variance can be estimated by sample variance

Term

describe the one sample t-test

Definition

tests whether the mean/expected value differs from a reference value

Term

describe the two sample t-test

Definition

if we have data from independent populations and want to compare the means/expected values

Term

when is the Welch's test used

Definition

when using a two sample t-test but the variances are unequal, the standard error of the 2t-test is modified

Term

describe the paired t-test

Definition

when we want to use two sample t-test but the data are paired (not independent), this will reduce it to a one sample t-test

Term

two assumptions of the one sample t-test

Definition

-sample data from normally distributed population
-values are representative of the population

Term

assumptions of the two sample t-test (3)

Definition

-samples must be independent and representative of the population
-approx. normally distributed
-variances should be approx. equal

Term

what is the Wilcoxon rank sum test

Definition

during a two sample t-test, when variances are not equal, we can transform the data to make them equal

Term

assumptions od a paired t-test

Definition

-the difference between the observations of each pair is approx. normally distributed

Term

what are the assumptions of the f test

Definition

-samples are independent and from normally distributed population
-samples are representative of the population

Term

what does the f test do

Definition

tests for the equality of two variances

Term

what is the Levene's test

Definition

used to compare two or more variances
test statistic follows the f distribution
-less dependent on the assumptions for the f test

Term

what does ANOVA stand for and what is it used for

Definition

analysis of variances
compares the means of two or more groups by investigating their variances

Term

what does the one way ANOVA do

Definition

it is an extension of the two tailed t-test for when we compare the means of more than two groups

Term

describe one way repeated measures ANOVA

Definition

extension of the paired t-test when we are comparing three or more treatments

Term

describe two way ANOVA

Definition

examines the effect of two factors on a response variable

Term

assumptions of the one way ANOVA

Definition

-variable of interest is numerical
-samples are independent and come from normally distributed population

Term

what is bonferroni's correction used for

Definition

when we reject the null in a one way ANOVA and we need to know which of the group means differ

Term

what are the most appropriate tests for comparing the mean of one or more populations when we have continuous variables

Definition

t-test and f-test

Term

what test should we use for categorical variables (ie binary)

Definition

chi square test, fishers exact test, Cochran Armitage test, McNemar test

Term

what does the Pearson correlation coefficient do

Definition

describes the strength of the linear relation (aka correlation) between two variables

Term

what is the purpose of a linear regression model

Definition

describes the linear relationship between two variables by using math equation

Term

what types of distribution is most appropriate for categorical variables

Definition

chi-squared

Term

describe how fishers exact test would be used

Definition

when we are testing for an association between categorical variables from independent groups of small sample size (<20)

Term

when is Chochran Armitage test used

Definition

when we are testing for a trend in proportions of categorical variables

Term

when is McNemars test used

Definition

when we have paired groups of categorical variables and we want to test for agreement

Term

name three different types of chi squared tests

Definition

McNemars test
Chochran Armitage test
Fishers exact test

Term

what would the value of the correlation coefficient be if there was perfect correlation

Definition

+1 or -1

Term

what would the value be of the correlation coefficient if there was no correlation

Definition

Term

what assumptions need to be made when testing the correlation coefficient

Definition

-both variables (X and Y) are numeric
-one of the variables is normally distributed

Term

under what circumstances should we not calculate the correlation coefficient

Definition

-when there is a relationship between the variables that is non-linear
-observations are not independent
-outliers present

Term

what is the point of linear regression

Definition

to model a linear relation between an outcome variable and one or more predictor/explanatory variables

Term

the outcome in a linear regression model is the dependent or independent variable

Definition

dependent

Term

True or false: a linear correlation proves causation

Definition

false

Term

true or false: a linear regression model proves causation

Definition

false

Term

what are residuals

Definition

the differences between the observed outcome y and its model predicted value (y^)

Term

what assumption needs to be true for linear regression models

Definition

-residuals should be approximately Gaussian distributed
-relationship between x and y s linear
-observations are independent
-for each value of x, population values of y are normally distributed

Term

what does a linear regression model describe and how

Definition

the relationship between 2 numerical variables by determining a straight line that approximates the data points on a scatter diagram most closely

Term

if a data point in a linear regression model has high leverage, what might this imply

Definition

it may be an outlier
any point with leverage greater than 4/n should be investigated

Term

what is cooks distance

Definition

a standardized measure of change in the parameters of the regression equation if the parameter point were omitted

Term

at what distance according to cooks distance is a point influential

Definition

Term

describe coefficient of determination

Definition

measures the fit of the regression model
how much variation in the outcome is explained by the variation in the predictor variable
it is the square of the correlation coefficient

Term

what is the difference between simple and multiple linear regression

Definition

simple: only one predictor variable
multiple: many predictor variables contribute to the explanation of an outcome in one model

Term

what is logistic regression and when is it used

Definition

used when we have categorical or binary outcome
models the influence of predictor variables
its an extension of chi squared/Chochran Armitage tests between a binary outcome and an ordered predictor variable

Term

what are the assumptions of multiple linear regression

Definition

-there is a linear relationship between a response variable and each explanatory variable
-residuals are independent (each individual appears once in the sample)
-residuals are normally distributed with 0 mean and constant variance

Term

what should we do if the regression coefficient in a logistic regression model has a large standard error

Definition

this means there is possible co-linearity

Term

why is the coefficient of determination not a good measure to compare multiple regression models

Definition

it cannot decrease by inclusion of more variables into the model

Term

what is the adjusted R squared

Definition

can be interpreted as the % variance reduction in the model predicted residuals as opposed to the residuals in the observed data y

Term

how can we check the goodness of fit in a multiple regression model

Definition

-check the model assumptions (linear, Gaussian, variance homogeny)
-check model fit (wald test p-value, outlier, leverage and influential observations)
-compare models (adjusted R squared, ANOVA, AIC)

Term

what is AIC

Definition

akaike information criterion
its an alternative to R squared used to compare regression models
model with the lower value is better fitting model

Term

what is logistic regression

Definition

equivalent to pearson chi square test used to investigate the relation of a binary outcome to multiple predictors

Term

true or false: the residuals in logistic regression model are Gaussian

Definition

false
unlike linear regression, they are not Gaussian

Term

what does the slope of a logistic regression model represent

Definition

the odds ratio

Term

what is survival analysis

Definition

the outcome of interest is the time from a certain starting point to the occurrence of an event
sometimes called "time to event" analysis

Term

what is right censoring

Definition

in a survival analysis when some animals never experience the outcome of interest

Term

what is an uninformative censor

Definition

the probability than an animal is censored not being related to the probability they experience the outcome of interest

Term

what is an administrative censor

Definition

also known as left censoring
when animals enter the study at different times, but the study ends at the same time so not all animals were followed for the same amount of time

Term

what is censoring

Definition

in survival analysis when for part of the study population, the time to the event is not known

Term

what is interval censoring

Definition

when the exact time to event is not known but is approximated

Term

what is the Kaplan-Meier estimator and how is it used

Definition

it is an estimator for the survival probability
it is the probability of surviving from a start point to a particular point in time
can be used when survival and censor times are known exactly

Term

what does the Kaplan-Meier method assume

Definition

that losses to follow up survive longer than deaths at the time

Term

what does the logrank test allow us to do

Definition

we can compare survival curves of two groups
the test statistic followsa chi square distribution

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Other Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile