Shared Flashcard Set

Details

Experimental Design and Analysis 2
John Brookers
128
Other
Undergraduate 1
01/09/2017

Additional Other Flashcards

 


 

Cards

Term
What is 'Induction?
Definition

The derivation of general ideas from specific observations - not used by scientists

 

Scientists will use hypothesising and experimentation

Term
What is Hypothetico-deductive reasoning?
Definition
Observations lead to plausible hypotheses, which we then attempt to falsify, if we cannot prove them false, they are good hypotheses, but not necessarily right
Term
What is a Theory?
Definition
A general set of ideas or rules used to explain a group of observations
Term
What is a Paradigm?
Definition
A way of thinking
Term
What is a Paradigm shift?
Definition
A change in the way we think about a subject
Term
What is a Null Hypothesis?
Definition
H0, The form of a hypothesis that we formally test, it predicts nothing will happen
Term
What is an Alternative hypothesis?
Definition
H1, A specific prediction about an experiment
Term
What is Nominal data?
Definition
Data in categories with names
Term
What is Discrete data?
Definition

Data that always rises in integers

 

Is treated as non-parametric

Term
What is Ordinal data?
Definition

Non-quantitative ranked data, normally used in questionnaires

 

Is treated as non-parametric but can be transformed

Term
What is Continuous data
Definition

Quantitative measurements on a continuous scale

 

Treated as parametric

Term
What are Descriptive statistics?
Definition
Measures calculated from a data set which summarise some characteristics of the data
Term
Measures of central tendancy
Definition
Mean, median and mode
Term
Frequency histogram
Definition
A graph showing the total number of quantitative observations in each of a series of numerically ordered categories
Term
Distribution
Definition
Shape of a data set
Term
Sum of squares
Definition
Total of all the squared deviates in a data set, squaring removes the minus, SS shows the magnitude of the variability but not the direction
Term
Variance
Definition
s2- the average size of the squared deviates in a sample - an estimate of the population variance
Term
Standard deviation
Definition
s - the average size of deviates in a data set.
Term
Population
Definition
All individuals in a group
Term
Sample
Definition
A sub-set of a population, meant to represent it
Term
Normal distribution
Definition
Bell-shaped, Gaussian, 68.5 of all data points are in one SD
Term
Standard error of the mean
Definition
A measure of the confidence we have in our sample mean as an estimate of the real mean
Term
Skew
Definition
If skewed to the right, there is a long tail to the right, atc. for left
Term
Parametric tests
Definition
Tests which make many assumptions
Term
Non-parametric tests
Definition
Tests which make fewer assumptions
Term
Poisson distribution
Definition
A distribution where a maximum possible count is far above the mean, resulting in a skew
Term
Binomial distribution
Definition
A distribution where the maximum count is close to the mean
Term
Bar chart
Definition
Used for visualising differences
Term
Scatter graph
Definition
Used for visualising trends
Term
Measurement precision
Definition
A measurement is not precise ifthere is an unbiased measurement error
Term
Measurement accuracy
Definition
A measurement is accurate if it is free from bias, bias occurs when there is a systematic error in your measurements
Term
Confounding effects
Definition
A confounding effect is something that influences your results in a way that can be confused with the effect you are studying
Term
Floor effect
Definition
Effects of a variable are only visible once above a certain point
Term
Ceiling effect
Definition
Effects of a variable are only visible below a certain point
Term
Independent samples t-test
Definition
A statistical test designed to test for a difference between the means of two samples of continuous data
Term
Type I error
Definition
The rejection of the null hypothesis when it is true
Term
Type II error
Definition
The failure to reject the null hypothesis when it is false
Term
Pseudoreplication
Definition
the use of non-independant data pointsw as if the were independant
Term
Paired design
Definition
A test designed were samples are not independant of each other, normally used to examine change
Term
Homogeneity of variance
Definition
If the variance is homogenous, it is the same in each sample
Term
Chi squared
Definition
A test which is used to examine differences between observed and expected counts
Term
Pearsons correlation coefficient
Definition
The statistic used to test the significance of correlations between two variables. Can only be used with linear relationships and normal distributions
Term
Spearmans rank correlation coefficient
Definition
Non-parametric correlation test
Term
ANOVA
Definition
Tests the null hypothesis that the samples means are not different
Term
Kruskal-Wallis test:
Definition
Non-parametric one way ANOVA
Term
ANCOVA
Definition
Combines anova and regression
Term
Hypotheses should be
Definition
Clear, Precise, Plausible, Able to produce testable predictions
Term
X with a line over it=
Definition
mean
Term
What does Xi show?
Definition
All data points
Term
SS formula =
Definition
i=n
Thesumof:(Xi-Xwithalineoverit)2quared
i=n
Term
Variance formula =
Definition
i=n
s2=thesumof(xiXwithalineoverit)2quared
i=1
_____________________________________
n-1
Term
SD =
Definition
Square root of variance
Term
95% of samples are within
Definition
1.96 SDs
Term
Standard error of mean formula
Definition
SEM = s
___
squareroot n
Term
Which tests have more statistical power?
Definition
Parametric tests
Term
For t tests, df=
Definition
n1+n2-2
Term
Parametric test standard assumptions
Definition
Independance
Homogenity of variance
Term
Alternative t test if the variances are not the same
Definition
Welch test
Term
To test if the variances are the same?
Definition
Levenes test
Term
If data is not normal, you can transform it by...
Definition
Squaring all the points, eliminating a right skew
square root arcsine all the points, eliminating a left skew
Term
Alternative T-test if the data is not normal
Definition
two-sample wilcoxon test
Term
Post hoc tests
Definition
adjust for chance of a type 1 error
Term
In regression, we analyse
Definition
the affect of a variable on another variable
Term
How do you work out the Probability of two independent events occurring?
Definition
Multiply both of the probability of each event occurring by each other
Term
How do you work out the probability of two non-independent events occurring?
Definition
Times the probability of one event occurring by the probability of the other event occurs IF the first event occurs.

P(A and B) = P(A).(B|A)
Term
What is the difference between sampling with or without replacement?
Definition
This is the idea that the probability of taking a sample from a group will differ depending on which sample you are taking if you are to replacing the sample back into the group after removing.

This makes it very important for statistical conditional probability
Term
What is bias sampling?
Definition
This is when you take a sample from a population but you may take from a sample that is not the true population, may be to do with biological difference in the species or something that means you will get biasly skewed data?
Term
What is a Bonferroni correction?
Definition
This is when you are doing multiple parallel studies which will often derive a Type I error due to the nature of the multiple testing and therefore you perform Bonferroni correction.

You will divide the significance threshold (e.g. 0.05) by the number of tests that were independently performed.

This will lead to a lower significance value and much lower chance of getting a type I error
Term
How can you correct a type II error?
Definition
You can perform more repetitions or take ore samples as it is just to do with statistical power.
That means with more statistical power you are able to reject or accept the null hypothesis with statistical confidence
Term
What is overfitting?
Definition
This is when you will have too many variables and assumptions in a statistical test. This means that random statistical noise that would usually be insignificant, is assumed to be significant because of all the statistical variables
Term
What is the binomial probability distribution of certain number of events (i) in a certain (n) number of trials where (p) is the probability of a certain event outcome in a singular trial?
Definition
[image]

This explained is the probability of seeing a certain number of 1 outcome (i) in a certain number of total trials (n) and the probability of the certain outcome we a re looking at (p)
Term
What does the poisson distribution equation look like?
Definition
[image]

This is where (i) is the the number of events we are working out the probability of seeing and (m) is the mean number of times the event we are looking at occurs
Term
When do you use the binomial distribution probability test and when do you use poisson?
Definition
The binomial should be used when there is a fixed number of trials in the experiment.
Poisson should be used if it is open ended
Term
When do you do a one tailed test or a two-tailed test?
Definition
One tailed is when you expect it the data to trend in a certain direction away from the average.

Two tailed is when the data may go either way and you are not sure which and then you will perform this. Do this also by halving the significant value at both ends.
Term
How do you perform a chi squared test for independence?
Definition
Take all these values first
[image]
and then take them away from the observed values and square the difference.

And then divide the expected values by the the squared deviations and then this will give you a probability which may or may not be below 0.05

If it is below 0.05 the difference is significant and the null hypothesis of non independent data is rejected
Term
How to work out the degrees of freedom when there are multiple rows and columns?
Definition
(R-1)(C-1) Is the way to work that out
Term
When testing data's significance using a chi squared non-independence test, what is the result?
Definition
When the final value is below 0.05, the value is therefore significant however it only tells you that the data is independent.
No real significance, it just means independence
Term
What test can be used to test for significant difference between two different sets of data that are not normal?
Definition
Wilcoxon two sample test.
Null hypothesis that the data are not statistical different and the differences are statistical random.
Term
When is a binomial distribution normal?
Definition
When p=0.5 for the events occurring and the curve is symmetrical
Term
How is the bell of a normally distributed curve defined?
Definition
The height is the mean and the standard deviation will explain the width of the curve
Term
How do you test for Normality of data?
Definition
Shapiro Wilk
Test for the normality of data.
Test statistic W and p value.
If the value for p is above the 0.05 value, then the data is normal.
Term
What is the difference between parameters and statistics?
Definition
Parameters are ASSUMPTIONS made about a population and statistics are the KNOWN results from the sample you have taken from the population.

For a normal distribution, a population will have a mean of μ and standard deviation σ, while a sample has a mean of x and a standard deviation of s.
Term
What is the equation for Standard deviation?
Definition
[image]
Term
How do you work out how different you statistical data is from the assumed population reality?
Definition
With the mean, you are able to work this out by calculating the Standard error of the mean.

This is the (standard deviation)÷(square root of the sample size)
Term
What is the central limit theorem?
Definition
This is the idea that when looking at a normally distributed curve, 95% of all the data will fall in the region of +/- 1.96x standard deviation
Term
What is meant by the 95% confidence interval of a mean?
Definition
This is the range of data that will fall in-between the mean ±1.96sd
Term
What is the sample size limit for similarities in population and sample sd to be assumed?
Definition
If a sample size is above 30, standard deviation can be assumed to similar enough between the population and the sample from the population and therefore can assume the 95% confidence interval of a mean value (±1.96sd)
If sample is below 30, instead of the ±1.96sd, we use a value called the t value.
This is derived by taking the degrees of freedom (n-1) and then looking on a t-distribution table and looking down the p=0.975 column.
Term
What is the basic normal distribution test for difference between two means?
Definition
[image]

Often having to double the final result for the Z test statistic as it is often a two tailed test.
Term
In the anova test, what is meant by treatment effect and residual effect?
Definition
The residual effect is how much the the individual sample will differ from the group mean and the treatment effect is how much the group mean will differ from the grand mean
Term
What are the two sets of DF in the anova test?
Definition
One is the df of the groups used (groups-1)
The other df is the total number of samples - the number of groups
Term
What is the significance of the F value in the the ANOVA test?
Definition
F threshold is based upon the two values of the degrees of freedom.
The Specific F value is the Treatment mean squared deviate÷Residual means squared deviate

If the specific value is higher than the threshold then the difference is significant
Term
What is the non-parametric equivalent of the ANOVA test?
Definition
Kruskal-Wallis Test
The one-way analysis of the variance of sets of independent data with equal or different sample sizes.
Used if not normal data or unequal variances. The data can be either one of those or both for this to be used.
Test statistic is χ2 and then the among and within group squared deviates and then the p value to know if the value is significant.
Term
What is a two-way ANOVA used for?
Definition
You are checking if there is a significant relationship between two or more factors on a certain test variable

[image]
Term
What is the most common form of transformation and why would you do it?
Definition
Take the Log10 values of the non normal data as this may then give distributions of normal data.
Do this to be able to perform parametric tests as they have much more statistical power
Term
What is the difference between a t test and a paired t test?
Definition
Paired t test has more statistical power.
Normal t test will just compare the difference of means of the two groups.
Paired t test will compare the difference between the mean difference in group values and 0
Term
What is the no-parametric version of the paired t test?
Definition
Wilcoxon signed rank test
Test statistic = V
Term
What is the equation for working out correlation, r?
Definition
[image]

Gives the correlation coefficient

This will be Pearson's rank coefficient
It is a different equation for the Spearman's rank
Term
What is the coefficient of determination?
Definition
This is the correlation value squared.
It represents the percentage of the variance in one variable is explained by the variance in the other variable
Term
How do you test for significance of the correlation coefficient?
Definition
Work out the standard error of the correlation:
[image]

And then divide the correlation coefficient by the standard error of the correlation.
If this value is larger than the corresponding T-value that matches your df for 0.975 as it is two tailed correlation significance testing, then the correlation is significant.

On r this will be given as a p value and the null hypothesis is that the correlation is not significant
Term
How do you work out the slope for regression?
Definition
The angle is just: [image]

And then the intercept of x=0 is where the slop is on the graph
Term
How do you test for significant regression?
Definition
Base it upon results of correlation. If correlation is significant, so it the regression and vice versa
Term
What does epsilon, ε, show?
Definition
In linear models, this will always incur some error in the model. The error is the same no matter what the other values are
Term
What is linear model of regression?
Definition
[image]
Term
What is the linear model of the t test?
Definition
[image]
Term
What is the linear model of the ANOVA test?
Definition
[image]
Term
What is the line model for a two way ANOVA?
Definition
[image]
Term
How do you assess the fit of the model?
Definition
the mean of the squared deviations of the actual values of y from the predictions of the model

The further away the data values and the model values are, the worse fir the model
Term
How do you reduce overfitting?
Definition
You produce a minimum adequate model.
This will be the linear model with the least number of variables in it. Only include the variables that really make a difference otherwise it will disrupt your read. Ignore minimal effect variables
Term
What does the + mean in a linear model?
Definition
It just means that in the model, that variable is included.
Does not mean mathematical addition
Term
How does adding more variables to a linear model effect the value of the sum of squared deviates?
Definition
More variables will ALWAYS increase the sum of squared deviates

Therefore if the difference is not signifiant, the minimum adequate model should be picked over the model with more variables
Term
What is the logistic function equation and graph for models that will have an upper and lower maximum?
Definition
[image]

[image]
Term
What is a link function?
Definition
This is when you are taking a η value and you predict a y value by putting the η value into a link function.
This is used in generalised models and looks like this:
Term
What is parametric data?
Definition
Continuous normal data

If it is count data, it is not parametric and therefore cannot do parametric test
Term
What form should linear regression lines be in?
Definition
Always be in the y=mx+c
Term
How can you quickly tell the difference between binomially and poisson distributed data?
Definition
The poisson distributed data will have expectation of massively high values compared to the real data.
Binomial expected values will be pretty close
Term
What is the difference between the Mann-Whitney and Wilcoxon?
Definition
Both are non parametric versions of the t test.

The Mann-Whitney is used for independent data usually.

Also I believe Wilcoxon can be for paired and Mann Whitney can be used for single t test
Term
How do you find outliers in R?
Definition
Plot the data on a Cleveland plot
Term
How do you find homogeneity of variance errors in R?
Definition
Plot the data on conditional box plot
Term
How do you find errors of normality in your data on R?
Definition
Plot the data in a histogram
Term
How do you find errors of too many zeros in your data in R?
Definition
Plot data into a Frequency histogram
Term
How do you find errors in interactions of data in R?
Definition
Plot data into a conditional plot
Term
What is a t test for normally distributed data but have unequal variances?
Definition
Welch test
Term
How do you work out the F value in an ANOVA test?
Definition
Divided the treatment mean square by the residual mean square
Term
What are the three important R commands you may need?
Definition
str = data columns
head = first few dat aline
dim = size of data matrix
Term
How should a visual basic excel file be saved?
Definition
.xlsm
Term
How should most excel files be saved?
Definition
.xlsx, .csv or .txt

If it is a visual file then .xlsm
Term
How do you fix a cell in an excel formula when dragging copying the cell formula?
Definition
Use a $ sign in front of the cell you are fixing in the formula
Term
Wha is a link function?
Definition
This is the relationship between the value of y in a linear model and a value η, which represents some or all of the variables in the linear model.

The link function is just the relationship between the two and will help predict values for y with increasing or decreasing values for x in the model.

Will create the S shaped asymptote that will never meet x=0 or x=1
Supporting users have an ad free experience!