| Term 
 | Definition 
 
        | The derivation of general ideas from specific observations - not used by scientists   Scientists will use hypothesising and experimentation |  | 
        |  | 
        
        | Term 
 
        | What is Hypothetico-deductive reasoning? |  | Definition 
 
        | Observations lead to plausible hypotheses, which we then attempt to falsify, if we cannot prove them false, they are good hypotheses, but not necessarily right |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A general set of ideas or rules used to explain a group of observations |  | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 
        | What is a Paradigm shift? |  | Definition 
 
        | A change in the way we think about a subject |  | 
        |  | 
        
        | Term 
 
        | What is a Null Hypothesis? |  | Definition 
 
        | H0, The form of a hypothesis that we formally test, it predicts nothing will happen |  | 
        |  | 
        
        | Term 
 
        | What is an Alternative hypothesis? |  | Definition 
 
        | H1, A specific prediction about an experiment |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Data in categories with names |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Data that always rises in integers   Is treated as non-parametric |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Non-quantitative ranked data, normally used in questionnaires   Is treated as non-parametric but can be transformed |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Quantitative measurements on a continuous scale   Treated as parametric |  | 
        |  | 
        
        | Term 
 
        | What are Descriptive statistics? |  | Definition 
 
        | Measures calculated from a data set which summarise some characteristics of the data |  | 
        |  | 
        
        | Term 
 
        | Measures of central tendancy |  | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | A graph showing the total number of quantitative observations in each of a series of numerically ordered categories |  | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | Total of all the squared deviates in a data set, squaring removes the minus, SS shows the magnitude of the variability but not the direction |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | s2- the average size of the squared deviates in a sample - an estimate of the population variance |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | s - the average size of deviates in a data set. |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | All individuals in a group |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A sub-set of a population, meant to represent it |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Bell-shaped, Gaussian, 68.5 of all data points are in one SD |  | 
        |  | 
        
        | Term 
 
        | Standard error of the mean |  | Definition 
 
        | A measure of the confidence we have in our sample mean as an estimate of the real mean |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | If skewed to the right, there is a long tail to the right, atc. for left |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Tests which make many assumptions |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Tests which make fewer assumptions |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A distribution where a maximum possible count is far above the mean, resulting in a skew |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A distribution where the maximum count is close to the mean |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Used for visualising differences |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Used for visualising trends |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A measurement is not precise ifthere is an unbiased measurement error |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A measurement is accurate if it is free from bias, bias occurs when there is a systematic error in your measurements |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A confounding effect is something that influences your results in a way that can be confused with the effect you are studying |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Effects of a variable are only visible once above a certain point |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Effects of a variable are only visible below a certain point |  | 
        |  | 
        
        | Term 
 
        | Independent samples t-test |  | Definition 
 
        | A statistical test designed to test for a difference between the means of two samples of continuous data |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | The rejection of the null hypothesis when it is true |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | The failure to reject the null hypothesis when it is false |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | the use of non-independant data pointsw as if the were independant |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A test designed were samples are not independant of each other, normally used to examine change |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | If the variance is homogenous, it is the same in each sample |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | A test which is used to examine differences between observed and expected counts |  | 
        |  | 
        
        | Term 
 
        | Pearsons correlation coefficient |  | Definition 
 
        | The statistic used to test the significance of correlations between two variables. Can only be used with linear relationships and normal distributions |  | 
        |  | 
        
        | Term 
 
        | Spearmans rank correlation coefficient |  | Definition 
 
        | Non-parametric correlation test |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Tests the null hypothesis that the samples means are not different |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Non-parametric one way ANOVA |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Combines anova and regression |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Clear, Precise, Plausible, Able to produce testable predictions |  | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | i=n Thesumof:(Xi-Xwithalineoverit)2quared
 i=n
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | i=n s2=thesumof(xiXwithalineoverit)2quared
 i=1
 _____________________________________
 n-1
 |  | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 
        | 95% of samples are within |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | Standard error of mean formula |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | Which tests have more statistical power? |  | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 | 
        |  | 
        
        | Term 
 
        | Parametric test standard assumptions |  | Definition 
 
        | Independance Homogenity of variance
 |  | 
        |  | 
        
        | Term 
 
        | Alternative t test if the variances are not the same |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | To test if the variances are the same? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | If data is not normal, you can transform it by... |  | Definition 
 
        | Squaring all the points, eliminating a right skew square root arcsine all the points, eliminating a left skew
 |  | 
        |  | 
        
        | Term 
 
        | Alternative T-test if the data is not normal |  | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | adjust for chance of a type 1 error |  | 
        |  | 
        
        | Term 
 
        | In regression, we analyse |  | Definition 
 
        | the affect of a variable on another variable |  | 
        |  | 
        
        | Term 
 
        | How do you work out the Probability of two independent events occurring? |  | Definition 
 
        | Multiply both of the probability of each event occurring by each other |  | 
        |  | 
        
        | Term 
 
        | How do you work out the probability of two non-independent events occurring? |  | Definition 
 
        | Times the probability of one event occurring by the probability of the other event occurs IF the first event occurs. 
 P(A and B) = P(A).(B|A)
 |  | 
        |  | 
        
        | Term 
 
        | What is the difference between sampling with or without replacement? |  | Definition 
 
        | This is the idea that the probability of taking a sample from a group will differ depending on which sample you are taking if you are to replacing the sample back into the group after removing. 
 This makes it very important for statistical conditional probability
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | This is when you take a sample from a population but you may take from a sample that is not the true population, may be to do with biological difference in the species or something that means you will get biasly skewed data? |  | 
        |  | 
        
        | Term 
 
        | What is a Bonferroni correction? |  | Definition 
 
        | This is when you are doing multiple parallel studies which will often derive a Type I error due to the nature of the multiple testing and therefore you perform Bonferroni correction. 
 You will divide the significance threshold (e.g. 0.05) by the number of tests that were independently performed.
 
 This will lead to a lower significance value and much lower chance of getting a type I error
 |  | 
        |  | 
        
        | Term 
 
        | How can you correct a type II error? |  | Definition 
 
        | You can perform more repetitions or take ore samples as it is just to do with statistical power. That means with more statistical power you are able to reject or accept the null hypothesis with statistical confidence
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | This is when you will have too many variables and assumptions in a statistical test. This means that random statistical noise that would usually be insignificant, is assumed to be significant because of all the statistical variables |  | 
        |  | 
        
        | Term 
 
        | What is the binomial probability distribution of certain number of events (i) in a certain (n) number of trials where (p) is the probability of a certain event outcome in a singular trial? |  | Definition 
 
        | [image] 
 This explained is the probability of seeing a certain number of 1 outcome (i) in a certain number of total trials (n) and the probability of the certain outcome we a re looking at (p)
 |  | 
        |  | 
        
        | Term 
 
        | What does the poisson distribution equation look like? |  | Definition 
 
        | [image] 
 This is where (i) is the the number of events we are working out the probability of seeing and (m) is the mean number of times the event we are looking at occurs
 |  | 
        |  | 
        
        | Term 
 
        | When do you use the binomial distribution probability test and when do you use poisson? |  | Definition 
 
        | The binomial should be used when there is a fixed number of trials in the experiment. Poisson should be used if it is open ended
 |  | 
        |  | 
        
        | Term 
 
        | When do you do a one tailed test or a two-tailed test? |  | Definition 
 
        | One tailed is when you expect it the data to trend in a certain direction away from the average. 
 Two tailed is when the data may go either way and you are not sure which and then you will perform this. Do this also by halving the significant value at both ends.
 |  | 
        |  | 
        
        | Term 
 
        | How do you perform a chi squared test for independence? |  | Definition 
 
        | Take all these values first [image]
 and then take them away from the observed values and square the difference.
 
 And then divide the expected values by the the squared deviations and then this will give you a probability which may or may not be below 0.05
 
 If it is below 0.05 the difference is significant and the null hypothesis of non independent data is rejected
 |  | 
        |  | 
        
        | Term 
 
        | How to work out the degrees of freedom when there are multiple rows and columns? |  | Definition 
 
        | (R-1)(C-1) Is the way to work that out |  | 
        |  | 
        
        | Term 
 
        | When testing data's significance using a chi squared non-independence test, what is the result? |  | Definition 
 
        | When the final value is below 0.05, the value is therefore significant however it only tells you that the data is independent. No real significance, it just means independence
 |  | 
        |  | 
        
        | Term 
 
        | What test can be used to test for significant difference between two different sets of data that are not normal? |  | Definition 
 
        | Wilcoxon two sample test. Null hypothesis that the data are not statistical different and the differences are statistical random.
 |  | 
        |  | 
        
        | Term 
 
        | When is a binomial distribution normal? |  | Definition 
 
        | When p=0.5 for the events occurring and the curve is symmetrical |  | 
        |  | 
        
        | Term 
 
        | How is the bell of a normally distributed curve defined? |  | Definition 
 
        | The height is the mean and the standard deviation will explain the width of the curve |  | 
        |  | 
        
        | Term 
 
        | How do you test for Normality of data? |  | Definition 
 
        | Shapiro Wilk Test for the normality of data.
 Test statistic W and p value.
 If the value for p is above the 0.05 value, then the data is normal.
 |  | 
        |  | 
        
        | Term 
 
        | What is the difference between parameters and statistics? |  | Definition 
 
        | Parameters are ASSUMPTIONS made about a population and statistics are the KNOWN results from the sample you have taken from the population. 
 For a normal distribution, a population will have a mean of μ and standard deviation σ, while a sample has a mean of x and a standard deviation of s.
 |  | 
        |  | 
        
        | Term 
 
        | What is the equation for Standard deviation? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | How do you work out how different you statistical data is from the assumed population reality? |  | Definition 
 
        | With the mean, you are able to work this out by calculating the Standard error of the mean. 
 This is the (standard deviation)÷(square root of the sample size)
 |  | 
        |  | 
        
        | Term 
 
        | What is the central limit theorem? |  | Definition 
 
        | This is the idea that when looking at a normally distributed curve, 95% of all the data will fall in the region of +/- 1.96x standard deviation |  | 
        |  | 
        
        | Term 
 
        | What is meant by the 95% confidence interval of a mean? |  | Definition 
 
        | This is the range of data that will fall in-between the mean ±1.96sd |  | 
        |  | 
        
        | Term 
 
        | What is the sample size limit for similarities in population and sample sd to be assumed? |  | Definition 
 
        | If a sample size is above 30, standard deviation can be assumed to similar enough between the population and the sample from the population and therefore can assume the 95% confidence interval of a mean value (±1.96sd) If sample is below 30, instead of the ±1.96sd, we use a value called the t value.
 This is derived by taking the degrees of freedom (n-1) and then looking on a t-distribution table and looking down the p=0.975 column.
 |  | 
        |  | 
        
        | Term 
 
        | What is the basic normal distribution test for difference between two means? |  | Definition 
 
        | [image] 
 Often having to double the final result for the Z test statistic as it is often a two tailed test.
 |  | 
        |  | 
        
        | Term 
 
        | In the anova test, what is meant by treatment effect and residual effect? |  | Definition 
 
        | The residual effect is how much the the individual sample will differ from the group mean and the treatment effect is how much the group mean will differ from the grand mean |  | 
        |  | 
        
        | Term 
 
        | What are the two sets of DF in the anova test? |  | Definition 
 
        | One is the df of the groups used (groups-1) The other df is the total number of samples - the number of groups
 |  | 
        |  | 
        
        | Term 
 
        | What is the significance of the F value in the the ANOVA test? |  | Definition 
 
        | F threshold is based upon the two values of the degrees of freedom. The Specific F value is the Treatment mean squared deviate÷Residual means squared deviate
 
 If the specific value is higher than the threshold then the difference is significant
 |  | 
        |  | 
        
        | Term 
 
        | What is the non-parametric equivalent of the ANOVA test? |  | Definition 
 
        | Kruskal-Wallis Test The one-way analysis of the variance of sets of independent data with equal or different sample sizes.
 Used if not normal data or unequal variances. The data can be either one of those or both for this to be used.
 Test statistic is χ2 and then the among and within group squared deviates and then the p value to know if the value is significant.
 |  | 
        |  | 
        
        | Term 
 
        | What is a two-way ANOVA used for? |  | Definition 
 
        | You are checking if there is a significant relationship between two or more factors on a certain test variable 
 [image]
 |  | 
        |  | 
        
        | Term 
 
        | What is the most common form of transformation and why would you do it? |  | Definition 
 
        | Take the Log10 values of the non normal data as this may then give distributions of normal data. Do this to be able to perform parametric tests as they have much more statistical power
 |  | 
        |  | 
        
        | Term 
 
        | What is the difference between a t test and a paired t test? |  | Definition 
 
        | Paired t test has more statistical power. Normal t test will just compare the difference of means of the two groups.
 Paired t test will compare the difference between the mean difference in group values and 0
 |  | 
        |  | 
        
        | Term 
 
        | What is the no-parametric version of the paired t test? |  | Definition 
 
        | Wilcoxon signed rank test Test statistic = V
 |  | 
        |  | 
        
        | Term 
 
        | What is the equation for working out correlation, r? |  | Definition 
 
        | [image] 
 Gives the correlation coefficient
 
 This will be Pearson's rank coefficient
 It is a different equation for the Spearman's rank
 |  | 
        |  | 
        
        | Term 
 
        | What is the coefficient of determination? |  | Definition 
 
        | This is the correlation value squared. It represents the percentage of the variance in one variable is explained by the variance in the other variable
 |  | 
        |  | 
        
        | Term 
 
        | How do you test for significance of the correlation coefficient? |  | Definition 
 
        | Work out the standard error of the correlation: [image]
 
 And then divide the correlation coefficient by the standard error of the correlation.
 If this value is larger than the corresponding T-value that matches your df for 0.975 as it is two tailed correlation significance testing, then the correlation is significant.
 
 On r this will be given as a p value and the null hypothesis is that the correlation is not significant
 |  | 
        |  | 
        
        | Term 
 
        | How do you work out the slope for regression? |  | Definition 
 
        | The angle is just: [image] 
 And then the intercept of x=0 is where the slop is on the graph
 |  | 
        |  | 
        
        | Term 
 
        | How do you test for significant regression? |  | Definition 
 
        | Base it upon results of correlation. If correlation is significant, so it the regression and vice versa |  | 
        |  | 
        
        | Term 
 
        | What does epsilon, ε, show? |  | Definition 
 
        | In linear models, this will always incur some error in the model. The error is the same no matter what the other values are |  | 
        |  | 
        
        | Term 
 
        | What is linear model of regression? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | What is the linear model of the t test? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | What is the linear model of the ANOVA test? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | What is the line model for a two way ANOVA? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | How do you assess the fit of the model? |  | Definition 
 
        | the mean of the squared deviations of the actual values of y from the predictions of the model 
 The further away the data values and the model values are, the worse fir the model
 |  | 
        |  | 
        
        | Term 
 
        | How do you reduce overfitting? |  | Definition 
 
        | You produce a minimum adequate model. This will be the linear model with the least number of variables in it. Only include the variables that really make a difference otherwise it will disrupt your read. Ignore minimal effect variables
 |  | 
        |  | 
        
        | Term 
 
        | What does the + mean in a linear model? |  | Definition 
 
        | It just means that in the model, that variable is included. Does not mean mathematical addition
 |  | 
        |  | 
        
        | Term 
 
        | How does adding more variables to a linear model effect the value of the sum of squared deviates? |  | Definition 
 
        | More variables will ALWAYS increase the sum of squared deviates 
 Therefore if the difference is not signifiant, the minimum adequate model should be picked over the model with more variables
 |  | 
        |  | 
        
        | Term 
 
        | What is the logistic function equation and graph for models that will have an upper and lower maximum? |  | Definition 
 | 
        |  | 
        
        | Term 
 | Definition 
 
        | This is when you are taking a η value and you predict a y value by putting the η value into a link function. This is used in generalised models and looks like this:
 |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | Continuous normal data 
 If it is count data, it is not parametric and therefore cannot do parametric test
 |  | 
        |  | 
        
        | Term 
 
        | What form should linear regression lines be in? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | How can you quickly tell the difference between binomially and poisson distributed data? |  | Definition 
 
        | The poisson distributed data will have expectation of massively high values compared to the real data. Binomial expected values will be pretty close
 |  | 
        |  | 
        
        | Term 
 
        | What is the difference between the Mann-Whitney and Wilcoxon? |  | Definition 
 
        | Both are non parametric versions of the t test. 
 The Mann-Whitney is used for independent data usually.
 
 Also I believe Wilcoxon can be for paired and Mann Whitney can be used for single t test
 |  | 
        |  | 
        
        | Term 
 
        | How do you find outliers in R? |  | Definition 
 
        | Plot the data on a Cleveland plot |  | 
        |  | 
        
        | Term 
 
        | How do you find homogeneity of variance errors in R? |  | Definition 
 
        | Plot the data on conditional box plot |  | 
        |  | 
        
        | Term 
 
        | How do you find errors of normality in your data on R? |  | Definition 
 
        | Plot the data in a histogram |  | 
        |  | 
        
        | Term 
 
        | How do you find errors of too many zeros in your data in R? |  | Definition 
 
        | Plot data into a Frequency histogram |  | 
        |  | 
        
        | Term 
 
        | How do you find errors in interactions of data in R? |  | Definition 
 
        | Plot data into a conditional plot |  | 
        |  | 
        
        | Term 
 
        | What is a t test for normally distributed data but have unequal variances? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | How do you work out the F value in an ANOVA test? |  | Definition 
 
        | Divided the treatment mean square by the residual mean square |  | 
        |  | 
        
        | Term 
 
        | What are the three important R commands you may need? |  | Definition 
 
        | str = data columns head = first few dat aline
 dim = size of data matrix
 |  | 
        |  | 
        
        | Term 
 
        | How should a visual basic excel file be saved? |  | Definition 
 | 
        |  | 
        
        | Term 
 
        | How should most excel files be saved? |  | Definition 
 
        | .xlsx, .csv or .txt 
 If it is a visual file then .xlsm
 |  | 
        |  | 
        
        | Term 
 
        | How do you fix a cell in an excel formula when dragging copying the cell formula? |  | Definition 
 
        | Use a $ sign in front of the cell you are fixing in the formula |  | 
        |  | 
        
        | Term 
 | Definition 
 
        | This is the relationship between the value of y in a linear model and a value η, which represents some or all of the variables in the linear model. 
 The link function is just the relationship between the two and will help predict values for y with increasing or decreasing values for x in the model.
 
 Will create the S shaped asymptote that will never meet x=0 or x=1
 |  | 
        |  |