PS6 Midterm 1
UCLA - Introduction to Data Analysis
24
Political Studies
02/11/2018

Term
 The whiskers of a boxplot tell you...
Definition
 the maximum and minimum
Term
 The top of the box in a boxplot tells you...
Definition
 the 3rd quartile
Term
 The bottom of the box in a boxplot tells you...
Definition
 the 1st quartile
Term
 The bar/line in a boxplot tells you...
Definition
 the median
Term
 The formula to determine whether a point will be an outlier is...
Definition
 median +/- (1.5 x IQR)
Term
 What is parameter?
Definition
 a summary measure that is the property of the population. It is what a researcher is trying to measure/test in a study.For example, the mean height of 20 year old women.
Term
 What is statistic?
Definition
 a summary or number that is a property of the sample. It is the answer to the parameter - the result that is reached.For example, the statistic of the mean height of 20 year old women is 5 foot 4 inches.
Term
 Independent Variable
Definition
 Not affected by any variable
Term
 Explanatory Variable
Definition
 A type of independent variable that isn't independent for certain. Always plotted on the x-axis.
Term
 Outcome Variable
Definition
 basically the same thing as dependent variable.Important to find a good outcome variable for the respective study.i.e. Post Office's outcome variable for the "bigness" of a package is weight in lbs and girth in inches.
Term
 Standard Deviation (Definition)
Definition
 Typical deviation from the mean. Give or take; average distance to the average.Smaller standard deviation desirable, demonstrates that data is more stable.
Term
 Standard Deviation (Formula)and steps to find
Definition
 It is the SQUARE ROOT of the variance.[image]Step 1: Find mean, n, and n-1.Step 2: Subtract each value by the mean.Step 3: Square every amount.Step 4: Add every squared value.Step 5: Divide by n-1.Step 6: Square root the quotient.
Term
 Variance
Definition
 It does the same thing as the standard deviation. It just doesn't have interpretable units.
Term
 Covariance
Definition
 the average amount that x and y deviate together. Used to calculate correlation.
Term
 Correlation
Definition
 The bounded version of covariance. Between -1 and 1. Having either -1 or 1 means a perfect linear relationship.No units.
Term
 Limits of Correlation
Definition
 -Variables must be continuous-Restricted to linear relationships-No causation-Sensitive to outliers
Term
 Pros of Histograms
Definition
 -Show distributions (i.e. bimodality)-Shows patterns in data-Organizes data into bins
Term
 Cons of Histograms
Definition
 -Hard to eyeball summary statistics like the mean and sd-Easily manipulable-Difficult to interpret when N is small
Term
 Pros of Boxplots
Definition
 -Provides quick, 5 number summary of the data (minimum, Q1, median, Q3, maximum)-Easily identify outliers-Easy to do side-by-side boxplots to compare groups
Term
 Cons of Boxplots
Definition
 -Doesn't show distribution-Hides patterns like bimodality and clusters-Skewed distributions make for an uninformative boxplot
Term
 Normal Distribution
Definition
 Same thing as a bell curve. Describes lots of real world phenomena. The normal density function requires mean and standard deviation.Features:-Symmetric around the mean. -Mean, median, and mode are equal.-Area under the curve is equal to 1.-68% of the area is within 1 SD of the mean-95% of the area is within 2 SDs of the mean
Term
 The Empirical Rule
Definition
 68-95-99 ruleHow to use the empirical rule:Use with normal distribution.Say the given mean is 30 and the sd is 5.The area within 1 SD of the mean is 35 and 25, so 68% of the x values lie between 25 and 35. The area within 2 SDs of the mean is 20 and 40, so 95% of the observations lie within 20 and 40.
Term
 Right skewed
Definition
 mean > median. Income of the world is an example.The histogram trails off to the right
Term
 Left skewed
Definition
 mean < median. The histogram trails off to the left.
