Term
|
Definition
|
|
Term
|
Definition
| AKA average. Most common descriptor of the center of data. |
|
|
Term
|
Definition
| Midpoint of the data set. Half the values lie above it and half lie below. |
|
|
Term
|
Definition
| The most frequent value in a data set. |
|
|
Term
|
Definition
| Difference between highest and lowest value. |
|
|
Term
|
Definition
| Marked split in values, each bundle is called a cluster. |
|
|
Term
|
Definition
| Number of times a value appears in the data. |
|
|
Term
|
Definition
| Division of data into four sets based on frequency. Used in boxplots. |
|
|
Term
|
Definition
| The difference between the first and third quartiles in a data set. |
|
|
Term
|
Definition
| How far the data is from the central value. Measured by standard deviation. |
|
|
Term
|
Definition
| How strongly a value will impact the mean. IE, a very large or very small value will affect the mean more than a central value. |
|
|
Term
|
Definition
| A measure of how distant values are from the center of the data. |
|
|
Term
| Empirical Rule of Standard Deviation |
|
Definition
| The rule states that in roughly bell-shaped data sets, 68% of values will fall within one SD, 95% will fall within two SDs, and 99.7% will fall within three. This is also known as the "68-95-99.7 Rule" |
|
|
Term
|
Definition
| AKA standard score. A measure of the number of standard deviations a value is from the center. |
|
|
Term
|
Definition
| The percent of values that are lower than an examined value. |
|
|
Term
|
Definition
| Misuse of statistics in which the height of a histogram is correctly represented but the area is not. |
|
|
Term
|
Definition
| This occurs when a conclusion based on individual groups of data is contradicted when the groups are combined. |
|
|
Term
|
Definition
|
|
Term
|
Definition
| Fitting a mathematical expression to explain a paired data set. |
|
|
Term
|
Definition
| If the explanatory and response variable increase and decrease together. |
|
|
Term
|
Definition
| If the explanatory and response variable increase and decrease inversely. |
|
|
Term
| Linear Correlation Coefficient |
|
Definition
| AKA "r". Measure of how well the data fits a mathematically defined linear correlation. (A regression/equation) |
|
|
Term
|
Definition
1) Standardize the variables. (value minus center, divided by SD) 2) Multiply each standardized x value by its corresponding y value. 3) Divide the sum of those products by the number of terms minus one. |
|
|
Term
|
Definition
| The difference between an expected value and the observed value. |
|
|
Term
|
Definition
| AKA "SSE". Sum of the squared residuals in a set. A measure to compare how well a regression fits the data. |
|
|
Term
| Least Squares Regression Line |
|
Definition
| The line with the lowest SSE. This means it is the best possible linear fit for the data set. |
|
|
Term
|
Definition
| If the explanatory variable is shown to effect the response variable. Be sure there is no lurking variable that better explains the correlation. |
|
|
Term
|
Definition
| The difference between a value and the center of the data set. |
|
|
Term
|
Definition
| The difference between the average response variable and the examined one that can be attributed to the explanatory variable. |
|
|
Term
|
Definition
| The difference between explained deviation and total deviation. |
|
|
Term
| Residual Standard Deviation |
|
Definition
| Standard deviation calculated from the deviance between expected and observed values. |
|
|
Term
|
Definition
|
|
Term
| Explanatory/Independent Variable |
|
Definition
| Variable that explains a correlation, usually on the x-axis. |
|
|
Term
| Response/Dependent Variable |
|
Definition
| Variable that results from a correlation, usually on the y-axis. |
|
|
Term
| Numerical/Quantitative Variable |
|
Definition
| Variable that a number defines. (IQ, height, time, etc) |
|
|
Term
| Categorical/Qualitative Variable |
|
Definition
| Variable that a word or category describes. (eye color, major, name, etc) |
|
|
Term
|
Definition
| Variable that can be anything within a range of values. (GPA, weight, etc) |
|
|
Term
|
Definition
| Variable that is one of some number of set values. (siblings, shoe size, etc) |
|
|
Term
|
Definition
| Categorical variable that has an inherent hierarchy of value. (Grades, business rating, military rank, etc) |
|
|
Term
|
Definition
| Variable that is not immediately obvious that may lead to incorrect conclusions. |
|
|
Term
|
Definition
| A value that is far removed from the rest of the data. Should only be removed if it is a mistake. |
|
|
Term
|
Definition
| Has a hat on top, means the value expected based on a regression. |
|
|
Term
|
Definition
| Extreme outlier that significantly changes the line of regression. |
|
|
Term
|
Definition
| Frequency of values is greatest near the median and least at the extremes of the range. |
|
|
Term
|
Definition
| The frequency of values is consistent across the entire range. |
|
|
Term
|
Definition
| Frequency of values is least near the median and greatest at the extremes of the range. |
|
|
Term
|
Definition
| Data is almost mirrored on each side of the central value. |
|
|
Term
| Right Skewed Distribution |
|
Definition
| Data is more frequent in lower values. (Long tail to the right) |
|
|
Term
|
Definition
| Data is more frequent in higher values. (Long tail to the left) |
|
|
Term
|
Definition
|
|
Term
|
Definition
| Graph that uses stacked dots to show frequency. (Most useful for small ranges with repeated values) |
|
|
Term
|
Definition
| Table that sorts data based on 10's place. |
|
|
Term
|
Definition
| Useful when one wants to call attention to the relative frequency of variables. |
|
|
Term
|
Definition
| Common bar graph, each bar represents a value and its height represents that values frequency. All bars are the same width. |
|
|
Term
|
Definition
| Special bar graph in which unranked categorical variables are listed from left to right in order of frequency. |
|
|
Term
|
Definition
| A graph that uses uneven widths to represent ranges of values and the area of the bar to represent those value's frequency. |
|
|
Term
| Steps of Drawing a Histogram |
|
Definition
1) Calculate the percentage of values in each group. 2) Find the height of each bar based on width. 3) Draw the histogram. |
|
|
Term
|
Definition
| When data is graphically represented by four evenly divided (in terms of frequency) ranges called quartiles. |
|
|
Term
|
Definition
| If a value is more than three times the interquartile range from the first or third quartiles, it's an outlier. If it's between 1.5 and three times the IQR, it's a potential outlier. |
|
|
Term
|
Definition
| When the data of two dependent variables is represented on the same graph. |
|
|
Term
|
Definition
| A graph of paired data represented by points. |
|
|
Term
|
Definition
| Inversion of the graph to set the linear regression to a slope of zero. Helps determine if points are evenly distributed above and below. |
|
|