Shared Flashcard Set

Details

Stats Midterm
Review of Terms, Concepts and Formulas
67
Mathematics
Undergraduate 1
03/13/2011

Additional Mathematics Flashcards

 


 

Cards

Term
Basic Context for Data (5-6 questions to ask)
Definition
Who, What, When, Where, Why and How?
Term
Categorical Variable
Definition
When Data answers questions but does not represent a sumable or manipulatable quantity. Can be represented by a #
Term
Quantitative Variable
Definition
Whenever a variable is in units representing exact amounts of something or some occurrence.
Term
Identifier Variables
Definition
A number assigned to each individual case for sorting purposes
Term
Frequency Table/ Relative Frequency Table
Definition
A table with different categories and and total counts or one which represents the proportion of each count as a percent
Term
Bar Chart
Definition
Displays distribution of a categorical variable. NOT a quantitative variable
Term
Contingency Table
Definition
A table which represents categories and breaks down the totals into their representative parts. The margins represent the totals
Term
Area Principle
Definition
When graphing data, make sure each catagory has an area which is proportional to its total in the group
Term
Simpson's Paradox
Definition
Unfair averaging over different groups without the same conditions and quantity
Term
Histogram
Definition
Only for quantitative data. Looks like bar graph (only for catagorical data) except that there is no space between bars unless there is a gap in the data. Good for illustrating distribution
Term
Stem and Leaf Displays (and Dotplots)
Definition
Writing the first digit on one side of the table, then listing one following digit for each case in that range. Dotplots replace digits with dots
Term
Three things to mention when describing distribution
Definition
Shape: Describe how many modes in data set/ symmetricallity/ outliers?
Center: Median/ Mean
Spread: Average variation/ interquartile range
Term
Unimodal /Bimodal/ Multimodal
Definition
With one hump/ 2 humps/ more than 2 heads
Term
Uniform (Shape)
Definition
Data which is fairly consistent, no modes or trend
Term
Skew
Definition
When there is a Tail (thinner ends of the distribution) one way or the other, the graph is said to be this
Term
Interquartile Range (IQR)
Definition
The upper quartile (75th percentile)- lower quartile (25th percentile)
Term
Variance
Definition
The total sum of the difference between each y value and the mean squared divided by (n-1)

It is just before you square root to find the standard deviation
Term
Standard Deviation
Definition
Take the square root of:
Sum of difference between y and the mean squared/ (n-1)
Term
Drawing Boxplots
Definition
1. Make boxes with lower, upper quartiles and mean. Add whiskers up to 1.5 times the IQR and add outliars
Term
Z-Score (Standardized Value)
Definition
(y-the mean of y)/ standard deviation. Written z(x) or z(y)
Term
How does standardizing data change data
Definition
Shape: Does not change
Center: Makes the mean 0
Spread: The standard deviation becomes 1
Term
Nearly Normal Condition
Definition
The shape of the data's distribution is unimodal and symmetric, then you can apply different things. Make a Picture
Term
The 68-95-99.7 Rule
Definition
Within 1 sd positively and negatively of 0 is 68% of data, within 2 is 95% of data, within 3 is 99.7
Term
Finding Normal Percentiles
Definition
Calculate Z-Score then look to left of table for 1st 2 digits and match with the top of the table to find the corresponding normal percentile
Term
Normal Probability Plot
Definition
The y axis is the x of the corresponding histogram (ex. mpg) and the x axis is each data points Z-score. Should be a diagonal, left-right graph
Term
Things to look for in Scatterplots
Definition
Direction: Is it positive or negative
Form: Is it linear? Curved?
Strength: How much does it scatter?
Outliers: Anything that significantly skews the data
Term
Predictor/ Explanatory Variable
Definition
The x-axis which is believed to inform or predict the y value
Term
Response Variable
Definition
The y axis and variable of interest. This is the variable used in St. dev. etc...
Term
Correlation (r)
Definition
Measures the strength of the linear association between two quantitative variables.

r= The sum of z(x) times z(y) / (n-1)
Term
Correlation Conditions
Definition
Quantitative Variables Condition: Make sure data isn't categorical
Straight Enough Condition: It is subjective, but make sure the data isn't clearly non-linear
Outlier Condition: Make sure outliers are not present as they can distory the correlation dramatically

Check these conditions with a scatter plot
Term
Lurking Variable
Definition
The explanation of why correlation is misleading and does not prove causation
Term
Kendall's tau
Definition
Designed to assess how close the relationship between two variables is to being monotone. A monotone relationship is how consistently they increase or decrease, not necessarily linearly. A value of -1 means constant decreasing, 1 means constant increase. Its a nonparametric value
Term
Spearman's Rho
Definition
Is less sensitive to outliers. Gives a rank (starting with 1, 2,3 etc....) to each x value. Also between -1 and 1. It is a nonparametric value.
Term
Residual
Definition
The difference of the y value of a coordinate and the predicted y value of a linear regression (also refered to as y(hat).
Term
Line of Best Fit
Definition
Also know as the least squares line
Term
Linear Regression equation
Definition
y(hat)= b0+ b1(x)
Term
b1 (The slope of linear regression) equation
Definition
r (sy/sx)
or
the correlation x times (standard deviation of y/ stand. dev. of x)
Term
b0 (y intercept)
Definition
y (avg)- b1*x(avg)
Term
R^2 value
Definition
Gives a positive fraction of the data's variation accounted for by the model
Term
Does the Plot Thinken? Condition
Definition
When you plot the residuals against the model, there should be no discernable pattern. If there is, your model isn't ideal
Term
Inverting the Regression
Definition
You can't simply rearrange regresion line equations unless correlation is 1.0. You must do the b1 and b0 formulas again
Term
Leverage
Definition
The extent to which a point influences analysis
Term
Subsets
Definition
Distinguishable traits of the data that can allow you to fit different regression lines to different segments of information (male/female etc...)
Term
Goals of Re-expression
Definition
1. Make the distribution of a variable more symmetric
2. Make the spread of several groups (as seen in side-by-side boxplots) more alike, even if their centers differ (often achieved with logs)
3. Make the form of a scatterplot more nearly linear
4. Make the scatter in a scatterplot spread out evenly rather than thickening at one end
Term
Ladder of Powers: 2
Definition
Try for unimodal, left skewed histograms
Term
Ladder of Powers: "0" aka Logs
Definition
This is the go to. You can't have negative or 0 numbers, so add small constants to all data to avoid mistakes. Try logging y, then logging x, and if all else fails log both.
Term
Ladder of Powers: -1/2
Definition
Negative square root perserves the direction of relationships. Your last bet
Term
Ladder of Powers:-1
Definition
Positive or negative, depending on which way you want the data to go. Ratios of 2 quantities benefit the most.
Term
Sample Strategies and Ideals to keep in mind:
Definition
1: Examine a Part of the Whole: Try to avoid bias by representing all parts of the population equally proportional to their representation in the whole
2: Randomize: When in doubt, make sure there is nothing that could be associated with what your sample
3: Its the Sample Size: The fraction of the population doesn't matter, just the actual sample size (2,000 is a good number).
Term
Sample Strategies and Ideals to keep in mind:
Definition
1: Examine a Part of the Whole: Try to avoid bias by representing all parts of the population equally proportional to their representation in the whole
2: Randomize: When in doubt, make sure there is nothing that could be associated with what your sample
3: Its the Sample Size: The fraction of the population doesn't matter, just the actual sample size (2,000 is a good number).
Term
Census
Definition
A sample of the entire population, often quite inefficent
Term
Parameter v. Statistics
Definition
Parameters are real information about the world that we are trying to get at, often in vain.
Statistics are anything we calculate from data
Term
Simple Random Sample (SRS)
Definition
A method by which any combination of samples could be selected. The basis for comparison with all other statistical methods
Term
Sampling Frame
Definition
The list of individuals from which the sample is drawn
Term
Stratified Random Sampling
Definition
Dividing the population into distinct strata of samples, and using a simple random sample within each strata.
Term
Cluster Sampling
Definition
Taking a representative cluster of the population which expresses the population as a whole. If it doesn't represent the population as a whole it will be bias. Can also be a piece of multistage samples
Term
Systematic Sample
Definition
When you use a nonrandom, but systematic sample of individuals. For example, selected every 20th person in a population.
Term
Pilot
Definition
A trial run of a survey before it is employed in a larger group at higher cost. Gives you a chance to recognize flaws in your design
Term
Sampling Technique Errors
Definition
Voluntary Response Sample: Because it is self-selective, it is inherently bias
Convenience Sampling: Does not usually make unbiased information
Term
Mistakes Which Can Arise
Definition
Nonrespondants: Its always a good investment to limit the amount of Nonrespondants, because their lack of incorporation can shift data
Response Bias: Anything in the survey which influences response (wording of a question, the environment its taken in)
Term
Observational Studies
Definition
When people or subjects are viewed in their natural environments. Often retrospective studies
Term
Prospective v. Retrospective Studies
Definition
Prospective studies follow randomly picked individuals and watch them for a given amount of time, generally favored over retrospective options
Term
Experiment
Definition
When you attempt to isolate very simple variables through random assignment of treatments to subjects. Active manipulation by researchers.
Term
The 4 Principles of Experimental Design
Definition
1. Control: Control sources of variation other than what we are testing
2. Randomization: Equalizes the effects of unforseen or uncontrollable sources of variation
3. Replicate: Results have to be replicated in slightly altered situations to show no bias
4. Block: Sometimes attributes affect outcomes of an experiment, so grouping different blocks together is more accurate
Term
The 4 Principles of Experimental Design
Definition
1. Control: Control sources of variation other than what we are testing
2. Randomization: Equalizes the effects of unforseen or uncontrollable sources of variation
3. Replicate: Results have to be replicated in slightly altered situations to show no bias
4. Block: Sometimes attributes affect outcomes of an experiment, so grouping different blocks together is more accurate
Term
Blinding
Definition
Limiting the effect knowledge can influence the experiment, by keeping key catagorical variables a secret from the subject and from the researcher. An experiment is "double blind" when even those who interprete the data are unaware of its identity.
Term
Matching
Definition
Pairing subjects because they are similar in ways not under study
Supporting users have an ad free experience!