Shared Flashcard Set

Details

POPM 6520 Stats
Statistics half of the course taught by Olaf Berke
95
Other
Graduate
12/11/2015

Additional Other Flashcards

 


 

Cards

Term
Probability
Definition
relative frequency of events
Term
binomial distribution
Definition
a simple model that assumes only two outcomes are possible.
models probability to observe k events among a sample of n individuals.
Term
Bernoulli distribution
Definition
when n=1 and we are interested in the probability of observing a case with a single draw from a binomaially distributed population
Term
name three ways to tell if something is Gaussian distributed
Definition
-investigate histogram
-qq plot
-apply significance test (Shapiro-Wilks test)
Term
QQ plot
Definition
compares quantiles of an observed frequency distribution to quantiles of an expected distribution.
used for testing Gaussian distribution
Term
Shapiro Wilks test
Definition
the null assumes the sample is Gaussian and if the test is not significant we accept the alternative that it is not
Term
de Moivre-Laplace Theorem
Definition
when the success probability/prevalence of binomial distribution converges to 0.5 or binomial population is increasing, the binomial distribution is becoming more symmetric
Term
variance
Definition
deviation of each observation from the mean
Term
standard deviation
Definition
average of the deviations of the observations from the mean
Term
coefficient of variation
Definition
standard deviation expressed as percentage of the mean
Term
the simplest method in R to estimate the mean and its confidence interval
Definition
t.test
Term
addition rule (probability)
Definition
when two events are mutually exclusive (cannot occur at the same time), the probability of either occurring is the sum of the probability of each event
Term
multiplication rule (probability)
Definition
two events are independent (occurrence of one does not affect the other) then the probability of both events occurring is the product of individual probabilities
Term
conditional probability
Definition
when two events are not independent; the probability of A occurring when we know B has occurred
Term
when is the binomial distribution used (list 2)
Definition
-when investigating a binary response (only two possible outcomes)
-for analyzing proportions and making inferences about them
Term
what do we do when data is skewed right in Gaussian ditribution
Definition
take the lognormal distribution
Term
properties of Gaussian distribution (6)
Definition
-described by 2 parameters (mean, SD)
-unimodal
-symmetrical about the mean
-mean, median and mode all equal
-if SD doesn't change, but mean increases then curve shifts right (decrease and it shifts left)
-decrease SD makes curve thinner, increase SD makes it fatter
Term
properties of t distribution (3)
Definition
-symmetrical about the mean
-characterized by degrees of freedom
-when large degrees of freedom, looks like normal distribution
Term
properties of chi-squared distribution (2)
Definition
-can only take positive values, highly skewed
-characterized by degrees of freedom (approaches normal when large)
Term
properties of f-distribution (3)
Definition
-distribution of a ratio
-two separate degrees of freedom (numerator and denominator)
-tabulated probabilities relate to ratio>1
Term
two distributions used when we are dealing with discrete variables
Definition
binomial and poisson
Term
normal distribution is used for what sort of variable
Definition
continuous
Term
when do we use continuity correction in normal distribution
Definition
if we use tables of normal distribution to approximate the poisson or binomial distribution
Term
what is the sampling distribution of the mean and what does it depend on
Definition
the extent to which a sample mean differs from population mean
depends on size of the sample (larger means less error)
variability of the observations (error greater if sample more diverse)
Term
what are the properties of sampling distribution of the mean (3)
Definition
-normal distribution if parent distribution is normal (assume normality if sample size >30)
-mean of the sampling distribution of the mean is same as parent pop
-standard deviation known as standard error of the mean (smaller with larger sample sizes)
Term
what is the difference between standard error of the mean and standard deviation
Definition
-SD measures scatter of the observations where SEM measures precision of the sample mean as an estimate of the population mean
Term
what is a confidence interval (for the mean)
Definition
defined by upper and lower limits, is a range of values within which we expect the true population mean to lie with a certain probability
Term
what is a null hypothesis
Definition
the converse of the study hypothesis (usually try to disprove it)
Term
what is an alternate hypothesis
Definition
states there is a difference between parameter values but the direction is not known (therefore usually leads to a two tailed test)
if we know one txt can only be better and not worse we may use a one sided test
Term
what is a p-value
Definition
the chance of getting the observed effect if the null hypothesis is true
Term
describe type I error
Definition
if the two means are equal and we have rejected the null hypothesis when we should not have rejected it
-limit the probability of TI error to be less than alpha (significance level)
Term
describe type II error
Definition
if the two means differ and we have not rejected the null when we should have
-probability of TII error designated by beta
-1-beta is the power of a test
Term
what are the different types of t-test and give a brief description
Definition
-one sample t-test: comparing mean/expected value to a reference value
-two sample t-test: comparing means/expected values of two independent populations
-Welch's test: a version of two sample test when variances are unequal
-paired t-test: when data is not independent (paired), so it is reduced to a one sample t-test
Term
what are the two main assumptions for using a t-test
Definition
-mean of the sample data is Gaussian distributed
-unknown variance can be estimated by sample variance
Term
describe the one sample t-test
Definition
tests whether the mean/expected value differs from a reference value
Term
describe the two sample t-test
Definition
if we have data from independent populations and want to compare the means/expected values
Term
when is the Welch's test used
Definition
when using a two sample t-test but the variances are unequal, the standard error of the 2t-test is modified
Term
describe the paired t-test
Definition
when we want to use two sample t-test but the data are paired (not independent), this will reduce it to a one sample t-test
Term
two assumptions of the one sample t-test
Definition
-sample data from normally distributed population
-values are representative of the population
Term
assumptions of the two sample t-test (3)
Definition
-samples must be independent and representative of the population
-approx. normally distributed
-variances should be approx. equal
Term
what is the Wilcoxon rank sum test
Definition
during a two sample t-test, when variances are not equal, we can transform the data to make them equal
Term
assumptions od a paired t-test
Definition
-the difference between the observations of each pair is approx. normally distributed
Term
what are the assumptions of the f test
Definition
-samples are independent and from normally distributed population
-samples are representative of the population
Term
what does the f test do
Definition
tests for the equality of two variances
Term
what is the Levene's test
Definition
used to compare two or more variances
test statistic follows the f distribution
-less dependent on the assumptions for the f test
Term
what does ANOVA stand for and what is it used for
Definition
analysis of variances
compares the means of two or more groups by investigating their variances
Term
what does the one way ANOVA do
Definition
it is an extension of the two tailed t-test for when we compare the means of more than two groups
Term
describe one way repeated measures ANOVA
Definition
extension of the paired t-test when we are comparing three or more treatments
Term
describe two way ANOVA
Definition
examines the effect of two factors on a response variable
Term
assumptions of the one way ANOVA
Definition
-variable of interest is numerical
-samples are independent and come from normally distributed population
Term
what is bonferroni's correction used for
Definition
when we reject the null in a one way ANOVA and we need to know which of the group means differ
Term
what are the most appropriate tests for comparing the mean of one or more populations when we have continuous variables
Definition
t-test and f-test
Term
what test should we use for categorical variables (ie binary)
Definition
chi square test, fishers exact test, Cochran Armitage test, McNemar test
Term
what does the Pearson correlation coefficient do
Definition
describes the strength of the linear relation (aka correlation) between two variables
Term
what is the purpose of a linear regression model
Definition
describes the linear relationship between two variables by using math equation
Term
what types of distribution is most appropriate for categorical variables
Definition
chi-squared
Term
describe how fishers exact test would be used
Definition
when we are testing for an association between categorical variables from independent groups of small sample size (<20)
Term
when is Chochran Armitage test used
Definition
when we are testing for a trend in proportions of categorical variables
Term
when is McNemars test used
Definition
when we have paired groups of categorical variables and we want to test for agreement
Term
name three different types of chi squared tests
Definition
McNemars test
Chochran Armitage test
Fishers exact test
Term
what would the value of the correlation coefficient be if there was perfect correlation
Definition
+1 or -1
Term
what would the value be of the correlation coefficient if there was no correlation
Definition
0
Term
what assumptions need to be made when testing the correlation coefficient
Definition
-both variables (X and Y) are numeric
-one of the variables is normally distributed
Term
under what circumstances should we not calculate the correlation coefficient
Definition
-when there is a relationship between the variables that is non-linear
-observations are not independent
-outliers present
Term
what is the point of linear regression
Definition
to model a linear relation between an outcome variable and one or more predictor/explanatory variables
Term
the outcome in a linear regression model is the dependent or independent variable
Definition
dependent
Term
True or false: a linear correlation proves causation
Definition
false
Term
true or false: a linear regression model proves causation
Definition
false
Term
what are residuals
Definition
the differences between the observed outcome y and its model predicted value (y^)
Term
what assumption needs to be true for linear regression models
Definition
-residuals should be approximately Gaussian distributed
-relationship between x and y s linear
-observations are independent
-for each value of x, population values of y are normally distributed
Term
what does a linear regression model describe and how
Definition
the relationship between 2 numerical variables by determining a straight line that approximates the data points on a scatter diagram most closely
Term
if a data point in a linear regression model has high leverage, what might this imply
Definition
it may be an outlier
any point with leverage greater than 4/n should be investigated
Term
what is cooks distance
Definition
a standardized measure of change in the parameters of the regression equation if the parameter point were omitted
Term
at what distance according to cooks distance is a point influential
Definition
>1
Term
describe coefficient of determination
Definition
measures the fit of the regression model
how much variation in the outcome is explained by the variation in the predictor variable
it is the square of the correlation coefficient
Term
what is the difference between simple and multiple linear regression
Definition
simple: only one predictor variable
multiple: many predictor variables contribute to the explanation of an outcome in one model
Term
what is logistic regression and when is it used
Definition
used when we have categorical or binary outcome
models the influence of predictor variables
its an extension of chi squared/Chochran Armitage tests between a binary outcome and an ordered predictor variable
Term
what are the assumptions of multiple linear regression
Definition
-there is a linear relationship between a response variable and each explanatory variable
-residuals are independent (each individual appears once in the sample)
-residuals are normally distributed with 0 mean and constant variance
Term
what should we do if the regression coefficient in a logistic regression model has a large standard error
Definition
this means there is possible co-linearity
Term
why is the coefficient of determination not a good measure to compare multiple regression models
Definition
it cannot decrease by inclusion of more variables into the model
Term
what is the adjusted R squared
Definition
can be interpreted as the % variance reduction in the model predicted residuals as opposed to the residuals in the observed data y
Term
how can we check the goodness of fit in a multiple regression model
Definition
-check the model assumptions (linear, Gaussian, variance homogeny)
-check model fit (wald test p-value, outlier, leverage and influential observations)
-compare models (adjusted R squared, ANOVA, AIC)
Term
what is AIC
Definition
akaike information criterion
its an alternative to R squared used to compare regression models
model with the lower value is better fitting model
Term
what is logistic regression
Definition
equivalent to pearson chi square test used to investigate the relation of a binary outcome to multiple predictors
Term
true or false: the residuals in logistic regression model are Gaussian
Definition
false
unlike linear regression, they are not Gaussian
Term
what does the slope of a logistic regression model represent
Definition
the odds ratio
Term
what is survival analysis
Definition
the outcome of interest is the time from a certain starting point to the occurrence of an event
sometimes called "time to event" analysis
Term
what is right censoring
Definition
in a survival analysis when some animals never experience the outcome of interest
Term
what is an uninformative censor
Definition
the probability than an animal is censored not being related to the probability they experience the outcome of interest
Term
what is an administrative censor
Definition
also known as left censoring
when animals enter the study at different times, but the study ends at the same time so not all animals were followed for the same amount of time
Term
what is censoring
Definition
in survival analysis when for part of the study population, the time to the event is not known
Term
what is interval censoring
Definition
when the exact time to event is not known but is approximated
Term
what is the Kaplan-Meier estimator and how is it used
Definition
it is an estimator for the survival probability
it is the probability of surviving from a start point to a particular point in time
can be used when survival and censor times are known exactly
Term
what does the Kaplan-Meier method assume
Definition
that losses to follow up survive longer than deaths at the time
Term
what does the logrank test allow us to do
Definition
we can compare survival curves of two groups
the test statistic followsa chi square distribution
Supporting users have an ad free experience!