Term
|
Definition
| Application of statistical reasoning and methods to biological, medical and public health problems. |
|
|
Term
| What is the role of biostatistics? |
|
Definition
1. To design studies 2. To develop hypothesis. 3. Descriptive statistics: To describe data (exploratory data analysis: order, group, summarize and graph). 4. Hipothesis testing: To apply statistical methods to test competing hypothesis |
|
|
Term
|
Definition
| It refers to the difference between the true value and the observed or measured value. |
|
|
Term
|
Definition
| Variation that exists between measurements about their average (sigma (population) or s (for sample). |
|
|
Term
| What types of studies exist? |
|
Definition
1. Experimental: RCT and basic science. 2. Observation: longitudial, cross-sectional, retro-prospective. |
|
|
Term
|
Definition
| It is a measure of central tendency. It is the values that represents 50th percentile (p50) or the second quintile (q2). 50% of the observations will fall below this value. It is more stable than the mean because it is insensitive to outliers. |
|
|
Term
| What is the relationship between mean and median in a positively skewed distribution (skewed to the right)? |
|
Definition
|
|
Term
| What does the relationship between the mean and the median tells us? |
|
Definition
| This relationship can be used to assess the symmetry of the distribution. |
|
|
Term
| What is the use of a Logarithmic Scale? |
|
Definition
| A Log Scale uses a constant multiplier on the Y axis. Allows for plotting in the same graph changes of different magnitudes (eg biological). It is also used for data analysis and transformation. |
|
|
Term
| What is p values and what is assumed for its calculation? |
|
Definition
It is a probability. Pr (sample statistic is >= to the observed statistic | Ho is true). The assumptions to estimate p are: 1. Random sampling. 2. Assumptions made on statistical model are valid. 3. No other bias is present. |
|
|
Term
|
Definition
| It is a measure of uncertainty associated with the occurrence of an event. It ranges from 0 to 1. |
|
|
Term
| When are 2 outcomes statistically independent? |
|
Definition
| Two outcomes are statistically independent if and only if Pr (A and B)= Pr (A)* Pr (B). This is a JOINT Pr. |
|
|
Term
| When are two outcomes mutually exclusive? |
|
Definition
| Two outcomes are mutually exclusive if and only if Pr (A and B)= 0. This is a joint probability. |
|
|
Term
| What is the addition rule of Probability? |
|
Definition
| Addition rule (Or): Pr (A or B)= Pr (A)+Pr(B)- Pr (A and B). The exception is when A and B are mutually exclusive. |
|
|
Term
| What is the conditional rule of probability? |
|
Definition
| Pr(A|B). Pr (A and B)/Pr(B). B has to be different of 0. |
|
|
Term
| What is the multiplication rule of probability? |
|
Definition
Pr (A and B)= Pr (B) * Pr (A|B). EXCEPTION: When A and B are Independent (Pr (A and B)= Pr(A) * Pr (B) |
|
|
Term
| What are the 3 rules of probability? |
|
Definition
1. Addition (Or) 2. Conditional (Given) 3. Multiplication (and) |
|
|
Term
|
Definition
| It is a conditional probability that is useful when not all data is available. |
|
|
Term
| What is a probability distribution? |
|
Definition
It is a list of probabilities from all the possible values that a random variable. Some of the most common Probability Distributions are: 1. For Discrete Variables (Gaps): Binomial (Dichotomous outcomes) Poisson (rare events) 2. For continuous outcomes: Gaussian and exponential. |
|
|
Term
|
Definition
| It is a counting technique. In a permutation, the order of the events matters. |
|
|
Term
|
Definition
| It is a counting technique. Order of events DOES NOT matter. |
|
|
Term
| What conditions need to be met to use Poisson as an approximation to the binomial? |
|
Definition
N has to be large (n>20) p has to be small |
|
|
Term
| What conditions need to be met to use the normal distribution as an approximation of the binomial? |
|
Definition
|
|
Term
|
Definition
| Visual aid that allows to compare multiple percentiles. It is used to compare distributions. When two distributions are equal, observation will fall on a straight line. Any variation from the normal line might indicate difference in the spread or shape. With a Q-Q plot, we can compare sample distribution vs theoretical, sample vs sample, and theo vs theoretical. |
|
|
Term
| What is statistical inference? |
|
Definition
| It refers to the methods that are applied to information that is drawn from a sample to make inferences about a population. |
|
|
Term
|
Definition
| It is a numerical descriptor that refers to the population (in Greek) |
|
|
Term
|
Definition
| Numerical descriptor of a sample. |
|
|
Term
| Name the different types of randomization that can be used in a clinical trial? |
|
Definition
1. Unrestricted randomization 2. Restricted randomization: used in small trials. Will assure that groups are balanced. 3. Stratified randomization 4. Matched paired randomization: used to produce balance in the composition of the groups on which matching is made |
|
|
Term
| What is the use of randomization? |
|
Definition
| It is a method used in the design of a study to adjust for known and unknown confounders. Only when the units (individual, family community) under study are randomized, we can be certain that the observed changes are due to the intervention and not to underlying differences of the units. Whenever possible, an intervention should be randomized and double blinded. |
|
|
Term
| Name some of the most important sample distributions? |
|
Definition
1. sample mean. 2. Difference of 2 sample means. 3. Proportion. 4. Difference of 2 sample proportions. The first two are used for continuous variables and the second two for binary outcomes. |
|
|
Term
| What assumptions are made regarding the Central Limit Theorem? |
|
Definition
1. Sample mean=pop mean 2. Sample sd= pop sd 3. Distribution of values of sample mean will be approximately normal. The CLT is used when sampling from a non-normal distribution with a large n. This distribution is important because many of the distributions are not normally distributed, but we would still like to be able to make inference about the population. |
|
|
Term
| What is reflected by the 95% CI? |
|
Definition
| The interval represents the uncertainity associated with a point estimate. |
|
|
Term
| What is hypothesis testing? |
|
Definition
| Statistical aid that helps to decide between competing hypothesis by examining a sample from a population. |
|
|
Term
| What are the steps to perform a hypothesis test? |
|
Definition
1. Select Pr model 2. Set up the Ho based on the problem being investigated. Set Ha deciding if the test will be 1 sided or 2 sided. 3. Select test statistic Z or t 4. Select critical region ALFA 5. Compare the observed value to the hypothesized value. 6. Make statistical decision and conclusion. 3. |
|
|
Term
| What happens to the CI when t is used instead of Z? |
|
Definition
| Z is used to calculate the CI whenever the SD is available, if not, you need to use t as the test statistic. When using t, the CI will be wider given that less information is available. As N gets larger, t will approximate z. |
|
|
Term
| When is the sampling distribution normal of approximately normal? |
|
Definition
The sampling distribution is normal or approx. normall when: 1. Sample is taken from a population with a normal distribution. 2. When N is large so Central Limit Theorem hold. |
|
|
Term
| What is the Variance Ratio Test? |
|
Definition
| It is an F test that is used to test the Ho for equal variance. Based on this result, we can decide if we pool the variance of two samples or if we do not pool. |
|
|
Term
| What is the sample statistic for pre-post designs? |
|
Definition
| d. Remember, here samples are not independent |
|
|
Term
|
Definition
It is the Analysis of Variance. It is a statistical technique for comparing means of multiple populations by partitioning different information in the datasets by sources of variability: between and within. The F test provided information of between group to within group variability. |
|
|
Term
| Which are the two criteria needed to determine the sample size? |
|
Definition
| 1. Precision: how much variability can be tolerated around the CI. 2. Power: Pr (Reject Ho|Ha is true). Thus, a power of 80% means that if the difference between 2 groups is what was expected, 4 out of 5 times that the study is conducted, we will be getting significant results. |
|
|
Term
| What is a regression analysis? |
|
Definition
| It is a statistical method that allows us to describe a response or outcome (Y) as a simple function of an outcome or a predictor variable (X) |
|
|
Term
| What is a regression analysis? |
|
Definition
| It is a statistical method that allows us to describe a response or outcome (Y) as a simple function of an outcome or a predictor variable (X) |
|
|
Term
| What is an Adjusted Variable Plot? |
|
Definition
| Visual Aid used to assess linearity, patterns and outliers in a model. |
|
|
Term
| What are inferences procedures that can be used for all 4 models? |
|
Definition
1. Estimate Bj, SE(Bj) - 95% CI, Hypot test, p value 2. Estimate linear combination (lincom), which allows to combine multiple coefficients. Useful for splines and interaction terms. - 95% CI, Hypot test, p value. 3. Compare extended vs null model: test hypothesis that multiple Bj equal 0 |
|
|
Term
| How are inferences for a Bj done in an MLR? |
|
Definition
| Using a partial t-test (only used for linear regression). |
|
|
Term
| How are inferences for a Bj done in an Cox, LR or LLR? |
|
Definition
| NO T TEST HERE. We use a Wald test or Z test. |
|
|
Term
| What are the inferences for Bj+Bj in a Regression Model? |
|
Definition
Use lincom for hypo test about a specific linear combination of B's. In an MLR, the test is set as a t In all the other we use a Z. Ho: Bj+Bj=0 |
|
|
Term
| To test null vs extended? |
|
Definition
| We would use a t test for MLR. For other models we would test this hypothesis with a LRT. F test is not applicable with this models (LR, LLR, Cox). |
|
|
Term
| How can we decide if the variable is a confounder or a mediator? |
|
Definition
| This decision is not taken with statistics, this decision is based on prior knowledge. |
|
|
Term
| What is effect modification? |
|
Definition
| It is when the coefficient for X variable differs depending on the value of one or more Xs. This concept applies to all 4 models. |
|
|
Term
| What type of variables will ANOVA allows to explore? |
|
Definition
|
|
Term
| What will ANCOVA allow us to explore? |
|
Definition
|
|
Term
| How do we check for model fit with an MLR? |
|
Definition
1. REsiduals Plots: 2. AVP We want to see non linear patterns, influential points, variance |
|
|
Term
| How do we check for model fit with an LR? |
|
Definition
1. Inspect observed vs predicted values. 2. Hosmer-Lameshor goodness of fit. Look for patterns, influential points and changing variance. Check influence of influential points. |
|
|
Term
| How do we check for model fit with an LLR and Cox? |
|
Definition
| Use Complementary Log Log plots. |
|
|
Term
| How do we select a regression model? |
|
Definition
Question of interest. Purpose Check for model fit Criteria used: cross validated mesures, AIC for all 4 models. Do not use R squared |
|
|
Term
| What is the binomial distribution |
|
Definition
it is a probability distribution for a series of random events, each of which can only have 2 values. It assumes that there are only 2 outcomes, pr of ocurrance of event is equal, independence of events. |
|
|
Term
|
Definition
| Given a population of any non-normal distribution, the sampling distribution of the sample mean, computed from all possible values of size n from this pop. will be approximately normal |
|
|
Term
| What is the Maximum Likelihood Estimate? |
|
Definition
| It is the best estimate of the parameter based on the statistic. |
|
|
Term
| What is the interpretation of the 95% CI? |
|
Definition
| We are 95% confident that the interval covers the true population mean. |
|
|
Term
| What are the properties of the t distribution? |
|
Definition
mean=median=mode symetrical about the mean family distribution determined by n-1 df approaches n as n-1 approaches infinity |
|
|
Term
|
Definition
| It is a statistical technique where the sample is treated as if it was the whole population. A random sample is taken with replacement. Process repeated 1000x and an histogram is made. It will approximate the sampling distribution of the statistic |
|
|
Term
| What is the F test in ANOVA? |
|
Definition
| It is a global test. Ho is for equality of all means. |
|
|
Term
| What are the assumptions for ANOVA? |
|
Definition
1. Obs are independent 2. Constant variance 3. Distribution approx normal |
|
|
Term
| What are the steps for an ANOVA? |
|
Definition
1. Bartlett's test for EQUAL VARIANCE (ANOVA ASSUMTION). 2. F test for equal means. 3. Estimate difference by multiple comparisions with Bonferroni for all possible pairwise comparisions. |
|
|
Term
| What is a correlation analysis (r)? |
|
Definition
Analysis that shows direction and strength of association between X and Y. -1 is perfect negative, 0 no linear relation, 1 is positive linear relation. Value of r is independent of units r is substantially influenced by small fraction of outliers |
|
|
Term
| What is residual analysis? |
|
Definition
Is a check on the assumptions of a regression. Check: 1. Residuals normally dist. on histogram 2. Random scatter on plot of residuals vs X 3. Random scatter on plot of residuals vs fitted If assumptions fail: Look for outliers Transform |
|
|
Term
| What is the coefficient of determination (r squared)? |
|
Definition
| It is the level of variation in Y explained by X. This is not a good measure for selecting a model. |
|
|
Term
| What is the principle objective of many intervention trials? |
|
Definition
| To estimate the size of the effect of the intervention on the outcomes. This estimate is subject to error, which derives from bias and sampling error (usually decreased when n is increased. Bias is not modified by this). |
|
|
Term
| Name the 2 criteria used to determine sample size |
|
Definition
1. Precision: how accurate your estimate needs to be. This is observed in the range of the CI around the estimate. The narrower the CI the less occurrence of sampling error. 2. Power: alternative, use power needed to detect effect of a given magnitude. Power depends on: delta, alfa, n and if test is one or two sided. |
|
|
Term
|
Definition
| Power curves are visual aids that are used to aid researchers when deciding between sample size or power. This are usually constucted for 1 ore 2 key outcomes |
|
|
Term
| Aim of sample size calculation based on hypothesis testing |
|
Definition
Have large enough samples to detect a difference in population means (or in population proportions) |
|
|
Term
| What is the Aim of sample size calculation based on precision? |
|
Definition
have a large enough sample with which to estimate a population mean (or difference in means) or proportion (or difference in proportions) within a narrow interval with high reliability |
|
|
Term
| What are the Ho for sample size calculation based on HT for one and 2 samples |
|
Definition
Δ = μa - μ0 or Δ = pa-p0 for one sample Δ = μ1 - μ2 or Δ = p1-p2 for two samples |
|
|
Term
| What is the goal of sample size calculation? |
|
Definition
Perform a study with large enough sample size and sufficient power to detect (through hypothesis testing) a meaningful difference Δ |
|
|
Term
| On what should sample size be based? |
|
Definition
Sample size calculation should be informed by previous investigations |
|
|
Term
| Other that statistical basis, on what else should sample size be determined? |
|
Definition
| Choice of sample size depends on a balance of reasonable assumptions, time, effort, and expense |
|
|
Term
| What can be some of the potential effects of having a clinical trial with a small sample size? |
|
Definition
No effect but wide CI Large effect but no power to detect delta |
|
|
Term
| What are other factors affecting sample size? |
|
Definition
Interim analysis Equivalence trials (large sample size) Loss to follow-up |
|
|