Shared Flashcard Set

Details

Title

Experimental Design and Analysis 2

Description

John Brookers

Total Cards

128

Subject

Other

Level

Undergraduate 1

Created

01/09/2017

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Other Flashcards

Cards Return to Set Details

Term

What is 'Induction?

Definition

The derivation of general ideas from specific observations - not used by scientists

Scientists will use hypothesising and experimentation

Term

What is Hypothetico-deductive reasoning?

Definition

Observations lead to plausible hypotheses, which we then attempt to falsify, if we cannot prove them false, they are good hypotheses, but not necessarily right

Term

What is a Theory?

Definition

A general set of ideas or rules used to explain a group of observations

Term

What is a Paradigm?

Definition

A way of thinking

Term

What is a Paradigm shift?

Definition

A change in the way we think about a subject

Term

What is a Null Hypothesis?

Definition

H₀, The form of a hypothesis that we formally test, it predicts nothing will happen

Term

What is an Alternative hypothesis?

Definition

H₁, A specific prediction about an experiment

Term

What is Nominal data?

Definition

Data in categories with names

Term

What is Discrete data?

Definition

Data that always rises in integers

Is treated as non-parametric

Term

What is Ordinal data?

Definition

Non-quantitative ranked data, normally used in questionnaires

Is treated as non-parametric but can be transformed

Term

What is Continuous data

Definition

Quantitative measurements on a continuous scale

Treated as parametric

Term

What are Descriptive statistics?

Definition

Measures calculated from a data set which summarise some characteristics of the data

Term

Measures of central tendancy

Definition

Mean, median and mode

Term

Frequency histogram

Definition

A graph showing the total number of quantitative observations in each of a series of numerically ordered categories

Term

Distribution

Definition

Shape of a data set

Term

Sum of squares

Definition

Total of all the squared deviates in a data set, squaring removes the minus, SS shows the magnitude of the variability but not the direction

Term

Variance

Definition

s2- the average size of the squared deviates in a sample - an estimate of the population variance

Term

Standard deviation

Definition

s - the average size of deviates in a data set.

Term

Population

Definition

All individuals in a group

Term

Sample

Definition

A sub-set of a population, meant to represent it

Term

Normal distribution

Definition

Bell-shaped, Gaussian, 68.5 of all data points are in one SD

Term

Standard error of the mean

Definition

A measure of the confidence we have in our sample mean as an estimate of the real mean

Term

Skew

Definition

If skewed to the right, there is a long tail to the right, atc. for left

Term

Parametric tests

Definition

Tests which make many assumptions

Term

Non-parametric tests

Definition

Tests which make fewer assumptions

Term

Poisson distribution

Definition

A distribution where a maximum possible count is far above the mean, resulting in a skew

Term

Binomial distribution

Definition

A distribution where the maximum count is close to the mean

Term

Bar chart

Definition

Used for visualising differences

Term

Scatter graph

Definition

Used for visualising trends

Term

Measurement precision

Definition

A measurement is not precise ifthere is an unbiased measurement error

Term

Measurement accuracy

Definition

A measurement is accurate if it is free from bias, bias occurs when there is a systematic error in your measurements

Term

Confounding effects

Definition

A confounding effect is something that influences your results in a way that can be confused with the effect you are studying

Term

Floor effect

Definition

Effects of a variable are only visible once above a certain point

Term

Ceiling effect

Definition

Effects of a variable are only visible below a certain point

Term

Independent samples t-test

Definition

A statistical test designed to test for a difference between the means of two samples of continuous data

Term

Type I error

Definition

The rejection of the null hypothesis when it is true

Term

Type II error

Definition

The failure to reject the null hypothesis when it is false

Term

Pseudoreplication

Definition

the use of non-independant data pointsw as if the were independant

Term

Paired design

Definition

A test designed were samples are not independant of each other, normally used to examine change

Term

Homogeneity of variance

Definition

If the variance is homogenous, it is the same in each sample

Term

Chi squared

Definition

A test which is used to examine differences between observed and expected counts

Term

Pearsons correlation coefficient

Definition

The statistic used to test the significance of correlations between two variables. Can only be used with linear relationships and normal distributions

Term

Spearmans rank correlation coefficient

Definition

Non-parametric correlation test

Term

ANOVA

Definition

Tests the null hypothesis that the samples means are not different

Term

Kruskal-Wallis test:

Definition

Non-parametric one way ANOVA

Term

ANCOVA

Definition

Combines anova and regression

Term

Hypotheses should be

Definition

Clear, Precise, Plausible, Able to produce testable predictions

Term

X with a line over it=

Definition

mean

Term

What does Xi show?

Definition

All data points

Term

SS formula =

Definition

i=n
Thesumof:(Xi-Xwithalineoverit)2quared
i=n

Term

Variance formula =

Definition

i=n
s2=thesumof(xiXwithalineoverit)2quared
i=1
_____________________________________
n-1

Term

SD =

Definition

Square root of variance

Term

95% of samples are within

Definition

1.96 SDs

Term

Standard error of mean formula

Definition

SEM = s
___
squareroot n

Term

Which tests have more statistical power?

Definition

Parametric tests

Term

For t tests, df=

Definition

n1+n2-2

Term

Parametric test standard assumptions

Definition

Independance
Homogenity of variance

Term

Alternative t test if the variances are not the same

Definition

Welch test

Term

To test if the variances are the same?

Definition

Levenes test

Term

If data is not normal, you can transform it by...

Definition

Squaring all the points, eliminating a right skew
square root arcsine all the points, eliminating a left skew

Term

Alternative T-test if the data is not normal

Definition

two-sample wilcoxon test

Term

Post hoc tests

Definition

adjust for chance of a type 1 error

Term

In regression, we analyse

Definition

the affect of a variable on another variable

Term

How do you work out the Probability of two independent events occurring?

Definition

Multiply both of the probability of each event occurring by each other

Term

How do you work out the probability of two non-independent events occurring?

Definition

Times the probability of one event occurring by the probability of the other event occurs IF the first event occurs.

P(A and B) = P(A).(B|A)

Term

What is the difference between sampling with or without replacement?

Definition

This is the idea that the probability of taking a sample from a group will differ depending on which sample you are taking if you are to replacing the sample back into the group after removing.

This makes it very important for statistical conditional probability

Term

What is bias sampling?

Definition

This is when you take a sample from a population but you may take from a sample that is not the true population, may be to do with biological difference in the species or something that means you will get biasly skewed data?

Term

What is a Bonferroni correction?

Definition

This is when you are doing multiple parallel studies which will often derive a Type I error due to the nature of the multiple testing and therefore you perform Bonferroni correction.

You will divide the significance threshold (e.g. 0.05) by the number of tests that were independently performed.

This will lead to a lower significance value and much lower chance of getting a type I error

Term

How can you correct a type II error?

Definition

You can perform more repetitions or take ore samples as it is just to do with statistical power.
That means with more statistical power you are able to reject or accept the null hypothesis with statistical confidence

Term

What is overfitting?

Definition

This is when you will have too many variables and assumptions in a statistical test. This means that random statistical noise that would usually be insignificant, is assumed to be significant because of all the statistical variables

Term

What is the binomial probability distribution of certain number of events (i) in a certain (n) number of trials where (p) is the probability of a certain event outcome in a singular trial?

Definition

[image]

This explained is the probability of seeing a certain number of 1 outcome (i) in a certain number of total trials (n) and the probability of the certain outcome we a re looking at (p)

Term

What does the poisson distribution equation look like?

Definition

[image]

This is where (i) is the the number of events we are working out the probability of seeing and (m) is the mean number of times the event we are looking at occurs

Term

When do you use the binomial distribution probability test and when do you use poisson?

Definition

The binomial should be used when there is a fixed number of trials in the experiment.
Poisson should be used if it is open ended

Term

When do you do a one tailed test or a two-tailed test?

Definition

One tailed is when you expect it the data to trend in a certain direction away from the average.

Two tailed is when the data may go either way and you are not sure which and then you will perform this. Do this also by halving the significant value at both ends.

Term

How do you perform a chi squared test for independence?

Definition

Take all these values first
[image]
and then take them away from the observed values and square the difference.

And then divide the expected values by the the squared deviations and then this will give you a probability which may or may not be below 0.05

If it is below 0.05 the difference is significant and the null hypothesis of non independent data is rejected

Term

How to work out the degrees of freedom when there are multiple rows and columns?

Definition

(R-1)(C-1) Is the way to work that out

Term

When testing data's significance using a chi squared non-independence test, what is the result?

Definition

When the final value is below 0.05, the value is therefore significant however it only tells you that the data is independent.
No real significance, it just means independence

Term

What test can be used to test for significant difference between two different sets of data that are not normal?

Definition

Wilcoxon two sample test.
Null hypothesis that the data are not statistical different and the differences are statistical random.

Term

When is a binomial distribution normal?

Definition

When p=0.5 for the events occurring and the curve is symmetrical

Term

How is the bell of a normally distributed curve defined?

Definition

The height is the mean and the standard deviation will explain the width of the curve

Term

How do you test for Normality of data?

Definition

Shapiro Wilk
Test for the normality of data.
Test statistic W and p value.
If the value for p is above the 0.05 value, then the data is normal.

Term

What is the difference between parameters and statistics?

Definition

Parameters are ASSUMPTIONS made about a population and statistics are the KNOWN results from the sample you have taken from the population.

For a normal distribution, a population will have a mean of μ and standard deviation σ, while a sample has a mean of x and a standard deviation of s.

Term

What is the equation for Standard deviation?

Definition

[image]

Term

How do you work out how different you statistical data is from the assumed population reality?

Definition

With the mean, you are able to work this out by calculating the Standard error of the mean.

This is the (standard deviation)÷(square root of the sample size)

Term

What is the central limit theorem?

Definition

This is the idea that when looking at a normally distributed curve, 95% of all the data will fall in the region of +/- 1.96x standard deviation

Term

What is meant by the 95% confidence interval of a mean?

Definition

This is the range of data that will fall in-between the mean ±1.96sd

Term

What is the sample size limit for similarities in population and sample sd to be assumed?

Definition

If a sample size is above 30, standard deviation can be assumed to similar enough between the population and the sample from the population and therefore can assume the 95% confidence interval of a mean value (±1.96sd)
If sample is below 30, instead of the ±1.96sd, we use a value called the t value.
This is derived by taking the degrees of freedom (n-1) and then looking on a t-distribution table and looking down the p=0.975 column.

Term

What is the basic normal distribution test for difference between two means?

Definition

[image]

Often having to double the final result for the Z test statistic as it is often a two tailed test.

Term

In the anova test, what is meant by treatment effect and residual effect?

Definition

The residual effect is how much the the individual sample will differ from the group mean and the treatment effect is how much the group mean will differ from the grand mean

Term

What are the two sets of DF in the anova test?

Definition

One is the df of the groups used (groups-1)
The other df is the total number of samples - the number of groups

Term

What is the significance of the F value in the the ANOVA test?

Definition

F threshold is based upon the two values of the degrees of freedom.
The Specific F value is the Treatment mean squared deviate÷Residual means squared deviate

If the specific value is higher than the threshold then the difference is significant

Term

What is the non-parametric equivalent of the ANOVA test?

Definition

Kruskal-Wallis Test
The one-way analysis of the variance of sets of independent data with equal or different sample sizes.
Used if not normal data or unequal variances. The data can be either one of those or both for this to be used.
Test statistic is χ2 and then the among and within group squared deviates and then the p value to know if the value is significant.

Term

What is a two-way ANOVA used for?

Definition

You are checking if there is a significant relationship between two or more factors on a certain test variable

[image]

Term

What is the most common form of transformation and why would you do it?

Definition

Take the Log10 values of the non normal data as this may then give distributions of normal data.
Do this to be able to perform parametric tests as they have much more statistical power

Term

What is the difference between a t test and a paired t test?

Definition

Paired t test has more statistical power.
Normal t test will just compare the difference of means of the two groups.
Paired t test will compare the difference between the mean difference in group values and 0

Term

What is the no-parametric version of the paired t test?

Definition

Wilcoxon signed rank test
Test statistic = V

Term

What is the equation for working out correlation, r?

Definition

[image]

Gives the correlation coefficient

This will be Pearson's rank coefficient
It is a different equation for the Spearman's rank

Term

What is the coefficient of determination?

Definition

This is the correlation value squared.
It represents the percentage of the variance in one variable is explained by the variance in the other variable

Term

How do you test for significance of the correlation coefficient?

Definition

Work out the standard error of the correlation:
[image]

And then divide the correlation coefficient by the standard error of the correlation.
If this value is larger than the corresponding T-value that matches your df for 0.975 as it is two tailed correlation significance testing, then the correlation is significant.

On r this will be given as a p value and the null hypothesis is that the correlation is not significant

Term

How do you work out the slope for regression?

Definition

The angle is just: [image]

And then the intercept of x=0 is where the slop is on the graph

Term

How do you test for significant regression?

Definition

Base it upon results of correlation. If correlation is significant, so it the regression and vice versa

Term

What does epsilon, ε, show?

Definition

In linear models, this will always incur some error in the model. The error is the same no matter what the other values are

Term

What is linear model of regression?

Definition

[image]

Term

What is the linear model of the t test?

Definition

[image]

Term

What is the linear model of the ANOVA test?

Definition

[image]

Term

What is the line model for a two way ANOVA?

Definition

[image]

Term

How do you assess the fit of the model?

Definition

the mean of the squared deviations of the actual values of y from the predictions of the model

The further away the data values and the model values are, the worse fir the model

Term

How do you reduce overfitting?

Definition

You produce a minimum adequate model.
This will be the linear model with the least number of variables in it. Only include the variables that really make a difference otherwise it will disrupt your read. Ignore minimal effect variables

Term

What does the + mean in a linear model?

Definition

It just means that in the model, that variable is included.
Does not mean mathematical addition

Term

How does adding more variables to a linear model effect the value of the sum of squared deviates?

Definition

More variables will ALWAYS increase the sum of squared deviates

Therefore if the difference is not signifiant, the minimum adequate model should be picked over the model with more variables

Term

What is the logistic function equation and graph for models that will have an upper and lower maximum?

Definition

[image]

[image]

Term

What is a link function?

Definition

This is when you are taking a η value and you predict a y value by putting the η value into a link function.
This is used in generalised models and looks like this:

Term

What is parametric data?

Definition

Continuous normal data

If it is count data, it is not parametric and therefore cannot do parametric test

Term

What form should linear regression lines be in?

Definition

Always be in the y=mx+c

Term

How can you quickly tell the difference between binomially and poisson distributed data?

Definition

The poisson distributed data will have expectation of massively high values compared to the real data.
Binomial expected values will be pretty close

Term

What is the difference between the Mann-Whitney and Wilcoxon?

Definition

Both are non parametric versions of the t test.

The Mann-Whitney is used for independent data usually.

Also I believe Wilcoxon can be for paired and Mann Whitney can be used for single t test

Term

How do you find outliers in R?

Definition

Plot the data on a Cleveland plot

Term

How do you find homogeneity of variance errors in R?

Definition

Plot the data on conditional box plot

Term

How do you find errors of normality in your data on R?

Definition

Plot the data in a histogram

Term

How do you find errors of too many zeros in your data in R?

Definition

Plot data into a Frequency histogram

Term

How do you find errors in interactions of data in R?

Definition

Plot data into a conditional plot

Term

What is a t test for normally distributed data but have unequal variances?

Definition

Welch test

Term

How do you work out the F value in an ANOVA test?

Definition

Divided the treatment mean square by the residual mean square

Term

What are the three important R commands you may need?

Definition

str = data columns
head = first few dat aline
dim = size of data matrix

Term

How should a visual basic excel file be saved?

Definition

.xlsm

Term

How should most excel files be saved?

Definition

.xlsx, .csv or .txt

If it is a visual file then .xlsm

Term

How do you fix a cell in an excel formula when dragging copying the cell formula?

Definition

Use a $ sign in front of the cell you are fixing in the formula

Term

Wha is a link function?

Definition

This is the relationship between the value of y in a linear model and a value η, which represents some or all of the variables in the linear model.

The link function is just the relationship between the two and will help predict values for y with increasing or decreasing values for x in the model.

Will create the S shaped asymptote that will never meet x=0 or x=1

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Other Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile