| Term 
 | Definition 
 
        | relative frequency of events |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | a simple model that assumes only two outcomes are possible. models probability to observe k events among a sample of n individuals.
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when n=1 and we are interested in the probability of observing a case with a single draw from a binomaially distributed population |  | 
        |  | 
        
        | Term 
 
        | name three ways to tell if something is Gaussian distributed |  | Definition 
 
        | -investigate histogram -qq plot
 -apply significance test (Shapiro-Wilks test)
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | compares quantiles of an observed frequency distribution to quantiles of an expected distribution. used for testing Gaussian distribution
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the null assumes the sample is Gaussian and if the test is not significant we accept the alternative that it is not |  | 
        |  | 
        
        | Term 
 
        | de Moivre-Laplace Theorem |  | Definition 
 
        | when the success probability/prevalence of binomial distribution converges to 0.5 or binomial population is increasing, the binomial distribution is becoming more symmetric |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | deviation of each observation from the mean |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | average of the deviations of the observations from the mean |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | standard deviation expressed as percentage of the mean |  | 
        |  | 
        
        | Term 
 
        | the simplest method in R to estimate the mean and its confidence interval |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | addition rule (probability) |  | Definition 
 
        | when two events are mutually exclusive (cannot occur at the same time), the probability of either occurring is the sum of the probability of each event |  | 
        |  | 
        
        | Term 
 
        | multiplication rule (probability) |  | Definition 
 
        | two events are independent (occurrence of one does not affect the other) then the probability of both events occurring is the product of individual probabilities |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | when two events are not independent; the probability of A occurring when we know B has occurred |  | 
        |  | 
        
        | Term 
 
        | when is the binomial distribution used (list 2) |  | Definition 
 
        | -when investigating a binary response (only two possible outcomes) -for analyzing proportions and making inferences about them
 |  | 
        |  | 
        
        | Term 
 
        | what do we do when data is skewed right in Gaussian ditribution |  | Definition 
 
        | take the lognormal distribution |  | 
        |  | 
        
        | Term 
 
        | properties of Gaussian distribution (6) |  | Definition 
 
        | -described by 2 parameters (mean, SD) -unimodal
 -symmetrical about the mean
 -mean, median and mode all equal
 -if SD doesn't change, but mean increases then curve shifts right (decrease and it shifts left)
 -decrease SD makes curve thinner, increase SD makes it fatter
 |  | 
        |  | 
        
        | Term 
 
        | properties of t distribution (3) |  | Definition 
 
        | -symmetrical about the mean -characterized by degrees of freedom
 -when large degrees of freedom, looks like normal distribution
 |  | 
        |  | 
        
        | Term 
 
        | properties of chi-squared distribution (2) |  | Definition 
 
        | -can only take positive values, highly skewed -characterized by degrees of freedom (approaches normal when large)
 |  | 
        |  | 
        
        | Term 
 
        | properties of f-distribution (3) |  | Definition 
 
        | -distribution of a ratio -two separate degrees of freedom (numerator and denominator)
 -tabulated probabilities relate to ratio>1
 |  | 
        |  | 
        
        | Term 
 
        | two distributions used when we are dealing with discrete variables |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | normal distribution is used for what sort of variable |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | when do we use continuity correction in normal distribution |  | Definition 
 
        | if we use tables of normal distribution to approximate the poisson or binomial distribution |  | 
        |  | 
        
        | Term 
 
        | what is the sampling distribution of the mean and what does it depend on |  | Definition 
 
        | the extent to which a sample mean differs from population mean depends on size of the sample (larger means less error)
 variability of the observations (error greater if sample more diverse)
 |  | 
        |  | 
        
        | Term 
 
        | what are the properties of sampling distribution of the mean (3) |  | Definition 
 
        | -normal distribution if parent distribution is normal (assume normality if sample size >30) -mean of the sampling distribution of the mean is same as parent pop
 -standard deviation known as standard error of the mean (smaller with larger sample sizes)
 |  | 
        |  | 
        
        | Term 
 
        | what is the difference between standard error of the mean and standard deviation |  | Definition 
 
        | -SD measures scatter of the observations where SEM measures precision of the sample mean as an estimate of the population mean |  | 
        |  | 
        
        | Term 
 
        | what is a confidence interval (for the mean) |  | Definition 
 
        | defined by upper and lower limits, is a range of values within which we expect the true population mean to lie with a certain probability |  | 
        |  | 
        
        | Term 
 
        | what is a null hypothesis |  | Definition 
 
        | the converse of the study hypothesis (usually try to disprove it) |  | 
        |  | 
        
        | Term 
 
        | what is an alternate hypothesis |  | Definition 
 
        | states there is a difference between parameter values but the direction is not known (therefore usually leads to a two tailed test) if we know one txt can only be better and not worse we may use a one sided test
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the chance of getting the observed effect if the null hypothesis is true |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | if the two means are equal and we have rejected the null hypothesis when we should not have rejected it -limit the probability of TI error to be less than alpha (significance level)
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | if the two means differ and we have not rejected the null when we should have -probability of TII error designated by  beta
 -1-beta is the power of a test
 |  | 
        |  | 
        
        | Term 
 
        | what are the different types of t-test and give a brief description |  | Definition 
 
        | -one sample t-test: comparing mean/expected value to a reference value -two sample t-test: comparing means/expected values of two independent populations
 -Welch's test: a version of two sample test when variances are unequal
 -paired t-test: when data is not independent (paired), so it is reduced to a one sample t-test
 |  | 
        |  | 
        
        | Term 
 
        | what are the two main assumptions for using a t-test |  | Definition 
 
        | -mean of the sample data is Gaussian distributed -unknown variance can be estimated by sample variance
 |  | 
        |  | 
        
        | Term 
 
        | describe the one sample t-test |  | Definition 
 
        | tests whether the mean/expected value differs from a reference value |  | 
        |  | 
        
        | Term 
 
        | describe the two sample t-test |  | Definition 
 
        | if we have data from independent populations and want to compare the means/expected values |  | 
        |  | 
        
        | Term 
 
        | when is the Welch's test used |  | Definition 
 
        | when using a two sample t-test but the variances are unequal, the standard error of the 2t-test is modified |  | 
        |  | 
        
        | Term 
 
        | describe the paired t-test |  | Definition 
 
        | when we want to use two sample t-test but the data are paired (not independent), this will reduce it to a one sample t-test |  | 
        |  | 
        
        | Term 
 
        | two assumptions of the one sample t-test |  | Definition 
 
        | -sample data from normally distributed population -values are representative of the population
 |  | 
        |  | 
        
        | Term 
 
        | assumptions of the two sample t-test (3) |  | Definition 
 
        | -samples must be independent and representative of the population -approx. normally distributed
 -variances should be approx. equal
 |  | 
        |  | 
        
        | Term 
 
        | what is the Wilcoxon rank sum test |  | Definition 
 
        | during a two sample t-test, when variances are not equal, we can transform the data to make them equal |  | 
        |  | 
        
        | Term 
 
        | assumptions od a paired t-test |  | Definition 
 
        | -the difference between the observations of each pair is approx. normally distributed |  | 
        |  | 
        
        | Term 
 
        | what are the assumptions of the f test |  | Definition 
 
        | -samples are independent and from normally distributed population -samples are representative of the population
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | tests for the equality of two variances |  | 
        |  | 
        
        | Term 
 
        | what is the Levene's test |  | Definition 
 
        | used to compare two or more variances test statistic follows the f distribution
 -less dependent on the assumptions for the f test
 |  | 
        |  | 
        
        | Term 
 
        | what does ANOVA stand for and what is it used for |  | Definition 
 
        | analysis of variances compares the means of two or more groups by investigating their variances
 |  | 
        |  | 
        
        | Term 
 
        | what does the one way ANOVA do |  | Definition 
 
        | it is an extension of the two tailed t-test for when we compare the means of more than two groups |  | 
        |  | 
        
        | Term 
 
        | describe one way repeated measures ANOVA |  | Definition 
 
        | extension of the paired t-test when we are comparing three or more treatments |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | examines the effect of two factors on a response variable |  | 
        |  | 
        
        | Term 
 
        | assumptions of the one way ANOVA |  | Definition 
 
        | -variable of interest is numerical -samples are independent and come from normally distributed population
 |  | 
        |  | 
        
        | Term 
 
        | what is bonferroni's correction used for |  | Definition 
 
        | when we reject the null in a one way ANOVA and we need to know which of the group means differ |  | 
        |  | 
        
        | Term 
 
        | what are the most appropriate tests for comparing the mean of one or more populations when we have continuous variables |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | what test should we use for categorical variables (ie binary) |  | Definition 
 
        | chi square test, fishers exact test, Cochran Armitage test, McNemar test |  | 
        |  | 
        
        | Term 
 
        | what does the Pearson correlation coefficient do |  | Definition 
 
        | describes the strength of the linear relation (aka correlation) between two variables |  | 
        |  | 
        
        | Term 
 
        | what is the purpose of a linear regression model |  | Definition 
 
        | describes the linear relationship between two variables by using math equation |  | 
        |  | 
        
        | Term 
 
        | what types of distribution is most appropriate for categorical variables |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | describe how fishers exact test would be used |  | Definition 
 
        | when we are testing for an association between categorical variables from independent groups of small sample size (<20) |  | 
        |  | 
        
        | Term 
 
        | when is Chochran Armitage test used |  | Definition 
 
        | when we are testing for a trend in proportions of categorical variables |  | 
        |  | 
        
        | Term 
 
        | when is McNemars test used |  | Definition 
 
        | when we have paired groups of categorical variables and we want to test for agreement |  | 
        |  | 
        
        | Term 
 
        | name three different types of chi squared tests |  | Definition 
 
        | McNemars test Chochran Armitage test
 Fishers exact test
 |  | 
        |  | 
        
        | Term 
 
        | what would the value of the correlation coefficient be if there was perfect correlation |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | what would the value be of the correlation coefficient if there was no correlation |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | what assumptions need to be made when testing the correlation coefficient |  | Definition 
 
        | -both variables (X and Y) are numeric -one of the variables is normally distributed
 |  | 
        |  | 
        
        | Term 
 
        | under what circumstances should we not calculate the correlation coefficient |  | Definition 
 
        | -when there is a relationship between the variables that is non-linear -observations are not independent
 -outliers present
 |  | 
        |  | 
        
        | Term 
 
        | what is the point of linear regression |  | Definition 
 
        | to model a linear relation between an outcome variable and one or more predictor/explanatory variables |  | 
        |  | 
        
        | Term 
 
        | the outcome in a linear regression model is the dependent or independent variable |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | True or false: a linear correlation proves causation |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | true or false: a linear regression model proves causation |  | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | the differences between the observed outcome y and its model predicted value (y^) |  | 
        |  | 
        
        | Term 
 
        | what assumption needs to be true for linear regression models |  | Definition 
 
        | -residuals should be approximately Gaussian distributed -relationship between x and y s linear
 -observations are independent
 -for each value of x, population values of y are normally distributed
 |  | 
        |  | 
        
        | Term 
 
        | what does a linear regression model describe and how |  | Definition 
 
        | the relationship between 2 numerical variables by determining a straight line that approximates the data points on a scatter diagram most closely |  | 
        |  | 
        
        | Term 
 
        | if a data point in a linear regression model has high leverage, what might this imply |  | Definition 
 
        | it may be an outlier any point with leverage greater than 4/n should be investigated
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | a standardized measure of change in the parameters of the regression equation if the parameter point were omitted |  | 
        |  | 
        
        | Term 
 
        | at what distance according to cooks distance is a point influential |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | describe coefficient of determination |  | Definition 
 
        | measures the fit of the regression model how much variation in the outcome is explained by the variation in the predictor variable
 it is the square of the correlation coefficient
 |  | 
        |  | 
        
        | Term 
 
        | what is the difference between simple and multiple linear regression |  | Definition 
 
        | simple: only one predictor variable multiple: many predictor variables contribute to the explanation of an outcome in one model
 |  | 
        |  | 
        
        | Term 
 
        | what is logistic regression and when is it used |  | Definition 
 
        | used when we have categorical or binary outcome models the influence of predictor variables
 its an extension of chi squared/Chochran Armitage tests between a binary outcome and an ordered predictor variable
 |  | 
        |  | 
        
        | Term 
 
        | what are the assumptions of multiple linear regression |  | Definition 
 
        | -there is a linear relationship between a response variable and each explanatory variable -residuals are independent (each individual appears once in the sample)
 -residuals are normally distributed with 0 mean and constant variance
 |  | 
        |  | 
        
        | Term 
 
        | what should we do if the regression coefficient in a logistic regression model has a large standard error |  | Definition 
 
        | this means there is possible co-linearity |  | 
        |  | 
        
        | Term 
 
        | why is the coefficient of determination not a good measure to compare multiple regression models |  | Definition 
 
        | it cannot decrease by inclusion of more variables into the model |  | 
        |  | 
        
        | Term 
 
        | what is the adjusted R squared |  | Definition 
 
        | can be interpreted as the % variance reduction in the model predicted residuals as opposed to the residuals in the observed data y |  | 
        |  | 
        
        | Term 
 
        | how can we check the goodness of fit in a multiple regression model |  | Definition 
 
        | -check the model assumptions (linear, Gaussian, variance homogeny) -check model fit (wald test p-value, outlier, leverage and influential observations)
 -compare models (adjusted R squared, ANOVA, AIC)
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | akaike information criterion its an alternative to R squared used to compare regression models
 model with the lower value is better fitting model
 |  | 
        |  | 
        
        | Term 
 
        | what is logistic regression |  | Definition 
 
        | equivalent to pearson chi square test used to investigate the relation of a binary outcome to multiple predictors |  | 
        |  | 
        
        | Term 
 
        | true or false: the residuals in logistic regression model are Gaussian |  | Definition 
 
        | false unlike linear regression, they are not Gaussian
 |  | 
        |  | 
        
        | Term 
 
        | what does the slope of a logistic regression model represent |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | what is survival analysis |  | Definition 
 
        | the outcome of interest is the time from a certain starting point to the occurrence of an event sometimes called "time to event" analysis
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | in a survival analysis when some animals never experience the outcome of interest |  | 
        |  | 
        
        | Term 
 
        | what is an uninformative censor |  | Definition 
 
        | the probability than an animal is censored not being related to the probability they experience the outcome of interest |  | 
        |  | 
        
        | Term 
 
        | what is an administrative censor |  | Definition 
 
        | also known as left censoring when animals enter the study at different times, but the study ends at the same time so not all animals were followed for the same amount of time
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | in survival analysis when for part of the study population, the time to the event is not known |  | 
        |  | 
        
        | Term 
 
        | what is interval censoring |  | Definition 
 
        | when the exact time to event is not known but is approximated |  | 
        |  | 
        
        | Term 
 
        | what is the Kaplan-Meier estimator and how is it used |  | Definition 
 
        | it is an estimator for the survival probability it is the probability of surviving from a start point to a particular point in time
 can be used when survival and censor times are known exactly
 |  | 
        |  | 
        
        | Term 
 
        | what does the Kaplan-Meier method assume |  | Definition 
 
        | that losses to follow up survive longer than deaths at the time |  | 
        |  | 
        
        | Term 
 
        | what does the logrank test allow us to do |  | Definition 
 
        | we can compare survival curves of two groups the test statistic followsa chi square distribution
 |  | 
        |  |