Term
|
Definition
| a goodness-of-fit measure in multiple regression analysis that penalizes additional explanatory variables by using a degrees of freedom adjustment in estimating the error variance |
|
|
Term
|
Definition
| the hypothesis against which the null hypothesis is tested |
|
|
Term
|
Definition
| the sum of numbers divided by n |
|
|
Term
|
Definition
| the group represented by the overall intercept in a multiple regression model that includes dummy explanatory variables |
|
|
Term
|
Definition
| a ceteris paribus change in one variable has an affect on another variable |
|
|
Term
|
Definition
| the difference between the expected value of an estimator and the population value is it supposed to be estimating |
|
|
Term
|
Definition
| an estimator whose expectation, or sample mean, is different from the population value it is supposed to be estimating |
|
|
Term
|
Definition
| a test for heteroskedasticity where the squared OLS residuals are regressed on the explanatory variables in the model |
|
|
Term
|
Definition
| all other relevant factors are held fixed |
|
|
Term
|
Definition
| a probability distribution obtained by adding the squares of independent standard normal random variables. The number of terms in the sum equals the degrees of freedom in the distribution |
|
|
Term
|
Definition
| the multiple linear regression model under the first set of classical linear model assumptions |
|
|
Term
|
Definition
| a sample of natural clusters or groups that usually consist of people |
|
|
Term
|
Definition
| a rule used to construct a random interval so that a certain percentage of all data sets, determined by the confidence level, yields an interval that contains the population value |
|
|
Term
|
Definition
| the percentage of samples in which we want our confidence interval to contain the population value; 95% is the most common confidence level, but 90% and 99% are also used |
|
|
Term
|
Definition
| an estimator converges in probability to the correct population value as the sample size grows |
|
|
Term
|
Definition
| a measure of linear dependence between two random variables |
|
|
Term
|
Definition
| in hypothesis testing, the value against which the test statistic is compared to determine whether or not the null hypothesis is rejected |
|
|
Term
|
Definition
| the interval at which time series data are collected. Yearly, quarterly, and monthly are the most common data frequencies |
|
|
Term
|
Definition
| in multiple regression analysis, the number of observations, minus the number of estimated parameters |
|
|
Term
|
Definition
| the variable to be explained in the multiple regression model |
|
|
Term
|
Definition
| a variable that takes on the value of zero or one |
|
|
Term
|
Definition
| the mistake of including too many dummy variables among the independent variables; it occurs when an overall intercept is in the model and a dummy variable is included for each group |
|
|
Term
|
Definition
| an equation relating the dependent variable to a set of explanatory variables and unobserved disturbances, where unknown population parameters determine the ceteris paribus effect of each explanatory variable |
|
|
Term
|
Definition
| a relationship derived from economic theory or less formal economic reasoning |
|
|
Term
|
Definition
| the percentage change in one variable given a 1% ceteris paribus increase in another variable |
|
|
Term
|
Definition
| a term used to describe the presence of an endogenous explanatory variable |
|
|
Term
| endogenous explanatory variable |
|
Definition
| an explanatory variable in a multiple regression model that is correlated with the error term, either because of an ommitted variable, measurement error, or simultaneity |
|
|
Term
|
Definition
| in simultaneous equation models, variables that determined by the equations in the system |
|
|
Term
|
Definition
| the variable in a simple or multiple regression equation that contains unobserved factors that affect the dependent variable. The error term may also include measurement errors in the observed dependent or independent variables |
|
|
Term
|
Definition
| the variance of the error term in a multiple regression model |
|
|
Term
|
Definition
| the numerical value taken on by an estimator for a particular sample of data |
|
|
Term
|
Definition
| a rule for combining data to produce a numerical value for a population parameter; the form of the rule does not depend on the particular sample obtained |
|
|
Term
|
Definition
| any variable that is uncorrelated with the error term in the model of interest |
|
|
Term
|
Definition
| a measure of central tendency in the distribution of a random variable, including an estimator |
|
|
Term
|
Definition
| in probability, a general term used to denote an event whose outcome is uncertain. In econometric analysis, it denotes a situation where data are collected by randomly assigning individuals to control and treatment groups |
|
|
Term
| explained sum of squares (SSE) |
|
Definition
| the total sample variation of the fitted values in the multiple regression model |
|
|
Term
|
Definition
| in regression analysis, a variable that is used to explain variation in the dependent variable |
|
|
Term
|
Definition
| a mathematical function defined for all variables that has an increasing slope but a constant proportionate change |
|
|
Term
|
Definition
| the probability distribution obtained by forming the ration of two independent chi-square random variables, where each has been divided by its degrees of freedom |
|
|
Term
|
Definition
| a statistic used to test multiple hypothesis about the parameters in a multiple regression model |
|
|
Term
|
Definition
| the estimated values of the dependent variable when the values of the independent variables for each observation are plugged into the OLS regression line |
|
|
Term
|
Definition
| the set of assumptions under which OLS is BLUE (best linear unbiased estimator)... 1) linear in parameters 2)random sampling 3)sample variation in the explanatory variable 4)zero conditional mean 5)homoskedasticity |
|
|
Term
|
Definition
| the theorem that states that, under the five Gauss-Markov assumptions, the OLS estimator is BLUE (conditional on the sample values of the explanatory variables) |
|
|
Term
| best linear unbiased estimator (BLUE) |
|
Definition
| among all linear unbiased estimators, the estimator with the smallest variance. OLS is BLUE, conditional on the sample values of the explanatory variables, under the Gauss-Markov assumptions |
|
|
Term
|
Definition
| a statistic that summarizes how well a set of explanatory variables explains a dependent or response variable |
|
|
Term
|
Definition
| the bias is OLS due to omitted heterogeneity (or omitted variables) |
|
|
Term
|
Definition
| the variance of the error term, given the explanatory variables, is not constant |
|
|
Term
| heteroskedasticity of unknown form |
|
Definition
| heteroskedasticity that may depend on the explanatory variables in an unknown, arbitrary fashion |
|
|
Term
| heteroskedasticity-robust f statistic |
|
Definition
| an F-type statistic that is (asymptotically) robust to heteroskedasticity of unknown form |
|
|
Term
| heteroskedasticity-rubust LM statistic |
|
Definition
| an LM statistic that is robust to heteroskedasticity of unknown form |
|
|
Term
| Heteroskedasticity-Robust Standard Error |
|
Definition
| a standard error that is (asymptotically) robust to heteroskedasticity of unknown form |
|
|
Term
| Heteroskedasticity-Robust t statistic |
|
Definition
| a t statistic that is (asymptotically) robust to heteroskedasticity of unknown form |
|
|
Term
|
Definition
| properties of estimators and test statistics that apply when the sample size grows without bound |
|
|
Term
|
Definition
| the errors in the a regression model have constant variance conditional on the explanatory variables |
|
|
Term
| Assumption SLR.1 (Linear in Parameters) |
|
Definition
| In the population mode, the dependent variable, y, is related to the independent variable, x, and the error (or distribution), u, as y= β0 + β1 xi + u where β0 and β1 are the population intercept and slope parameters, respectively. |
|
|
Term
| Assumption SLR. 2 (Random Sampling) |
|
Definition
| We have a random sample of size n, {(x1,yi): i =1,2,…..,n}, following the population model y= β0 + β1 xi + u |
|
|
Term
| Assumption SLR.3 (Sample Variation in the Explanatory Variable) |
|
Definition
| The sample outcomes on x, namely, {xi, i=1,….n}, are not all the same value |
|
|
Term
| Assumption SLR.4 (Zero Conditional Mean) |
|
Definition
| The error u has en expected value of zero given any value of the explanatory variable. In other words, E(u│x)=0 |
|
|
Term
| Assumption SLR.5 (Homoskedasticity) |
|
Definition
| The error u has the same variance given any value of the explanatory variable. In other words, Var(u│x)= σ2 |
|
|
Term
|
Definition
| a statistical test of the null, or maintained, hypothesis against an alternative hypothesis |
|
|
Term
|
Definition
| the difference between the probability limit of an estimator and the parameter value |
|
|
Term
|
Definition
| an estimator does not coverage (in probability) to the correct population parameter as the sample size grows |
|
|
Term
|
Definition
| in multiple regression, the partial effect of one explanatory variable depends on the value of a different explanatory variable |
|
|
Term
|
Definition
| an independent variable in a regression model that is the product of two explanatory variables |
|
|
Term
|
Definition
| in the equation of a line, the value of the y variable when the x variable is 0 |
|
|
Term
|
Definition
| the intercept in a regression model differs by group or time period |
|
|
Term
|
Definition
| the probability distribution determining the probabilities of outcomes involving two or more random variables |
|
|
Term
|
Definition
| a test involving more than one restriction on the parameters in a model |
|
|
Term
|
Definition
| failure to reject, using as F test at a specified significance level, that all coefficients for a group of explanatory variables are zero |
|
|
Term
| jointly statistically significant |
|
Definition
| the null hypothesis that two or more explanatory variables have zero population coefficients is rejected at the chosen significance level |
|
|
Term
| Lagrange Multiplier (LM) Statistic |
|
Definition
| a test-statistic with large sample justification that can be used to test for omitted variables, heteroskedasticity, and serial correlation, among other model specification problems |
|
|
Term
|
Definition
| a estimator that minimizes a sum of squared residuals |
|
|
Term
|
Definition
| a regression model where the dependent variable and the independent variables are in level (or original) form |
|
|
Term
|
Definition
| a regression model where the dependent variable is in level form and (at least one of) the independent variables are in logarithmic form |
|
|
Term
|
Definition
| a function where the change in the dependent variable, given a one-unit change in an independent variable, is constant |
|
|
Term
|
Definition
| a mathematical function, defined only for positive arguments with a positive but decreasing slope |
|
|
Term
|
Definition
| a mathematical function, defined only for strictly positive arguments, with a positive but decreasing slope |
|
|
Term
|
Definition
| a regression model where the dependent variable is in logarithmic form and the independent variables are in level (or original) form |
|
|
Term
|
Definition
| a regression model where the dependent variable (and at least some of) the independent variables are in logarithmic form |
|
|
Term
|
Definition
|
|
Term
|
Definition
| the expected squared distance that an estimator is from the population value; it equals the variance plus the square of any bias |
|
|
Term
|
Definition
| the difference between an observed variable and a variable that belongs in a multiple regression equation |
|
|
Term
|
Definition
| in a probability distribution, it is the value where there is 50% chance of being below the value and a 50% chance of being above it. In a sample of numbers, it is the middle value after the numbers have been ordered |
|
|
Term
|
Definition
| a data problem that occurs when we do not observe values of some variables for certain observations (individuals, cities, time periods, and so on |
|
|
Term
|
Definition
| a term that refers to the correlation among the independent variables in a multiple regression model; it is usually evoked when some correlations are "large," but an actual magnitude is not well-defined |
|
|
Term
|
Definition
| a test of a null hypothesis involving more than one restriction on the parameters |
|
|
Term
| Multiple Linear Regression (MLR) Model |
|
Definition
| a model linear in its parameters, where the dependent variable is a function of independent variables plus an error term |
|
|
Term
| multiple regression analysis |
|
Definition
| a type of analysis that is used to describe estimation of and inference in the multiple linear regression model |
|
|
Term
|
Definition
| a function whose slope is not constant |
|
|
Term
|
Definition
| two or more models where no model can be written as a special case of the other by imposing restrictions on the parameters |
|
|
Term
|
Definition
| a sample obtained other than by sampling randomly from the population of interest |
|
|
Term
|
Definition
| a probability distribution commonly used in statistics and econometrics for modeling a population. Its probability distribution has a bell shape. |
|
|
Term
|
Definition
| the classical linear model assumption that states that the error (or dependent variable)has a normal distribution, conditional on the explanatory variables |
|
|
Term
|
Definition
| in classical hypothesis testing, we take this hypothesis as true and require the data to provide substantial evidence against it |
|
|
Term
|
Definition
| the bias that arises in the OLS estimators when a relevant variable is omitted from the regression |
|
|
Term
|
Definition
| an alternative hypothesis that states that the parameter is greater than (or less than)the value hypothesized under the null |
|
|
Term
|
Definition
| a hypothesis test against a one side alternative |
|
|
Term
| ordinary least squares (OLS) |
|
Definition
| a method for estimating the parameters of a multiple linear regression model. The OLS estimates are obtained by minimizing the sum of squared residuals |
|
|
Term
|
Definition
| observations in a data set that are substantially different from the bulk of the data, perhaps because of error or because some data are generated by a different model than most of the other data |
|
|
Term
| overall significance of a regression |
|
Definition
| a test of the joint significance of all explanatory variables appearing in a multiple regression equation |
|
|
Term
|
Definition
| the smallest significance level at which the null hypothesis can be rejected |
|
|
Term
|
Definition
| an unknown value that describes a population relationship |
|
|
Term
|
Definition
| the effect on an explanatory variable on the dependent variable, holding other factors in the regression model fixed |
|
|
Term
|
Definition
| the proportionate change in a variable, multiplied by 100 |
|
|
Term
|
Definition
| in multiple regression, one independent variable is an exact linear function of one or more other independent variables |
|
|
Term
|
Definition
| a probability distribution for count variables |
|
|
Term
|
Definition
| a well-defined group (of people, firms, cities, and so on) that is the focus of a statistical tool or econometric analysis |
|
|
Term
|
Definition
| the practical or economic importance of an estimate, which is measured by its sign and magnitude, as opposed to its statistical significance |
|
|
Term
|
Definition
| the estimate of an outcome obtained by plugging specific values of the explanatory variables into an estimated model, usually a multiple regression model |
|
|
Term
|
Definition
| a mathematical function where the vector argument both pre- and post- multiples a square, systematic matrix |
|
|
Term
|
Definition
| functions that contain squares of one or more explanatory variables; they capture diminishing or decreasing effects on the dependent variable |
|
|
Term
|
Definition
| in a multiple regression model, the proportion of the total sample variation in the dependent variable that is explained by the independent variable |
|
|
Term
|
Definition
| a sample obtained by sampling randomly from the specified population |
|
|
Term
| Regression Specification Error Test (RESET) |
|
Definition
| a general test for functional form in a multiple regression model; it is an F test of joint significance of the squares, cubes, and perhaps higher powers of the fitted values from the initial estimators |
|
|
Term
| Misspecification Analysis |
|
Definition
| the process of determining likely biases that can arise from omitted variables, measurement error, simultaneously, and other kinds of model misspecification |
|
|
Term
|
Definition
| the difference between the actual value and the fitted (or predicted) value; there is a residual for each observation is a sample used to obtain the OLS regression line |
|
|
Term
|
Definition
| in hypothesis testing, the model obtained after imposing all of the restrictions required after the null |
|
|
Term
|
Definition
| the percentage change in the dependent variable given a one-unit increase in an independent variable |
|
|
Term
|
Definition
| the probability of type I error in hypothesis testing |
|
|
Term
|
Definition
| in the equation of a line, the change in the y variable when the x variable increases by 1 |
|
|
Term
|
Definition
| the coefficient of an independent variable in a multiple regression model |
|
|
Term
|
Definition
| a correlation between two variables that is not due to causality, but perhaps to the dependence of the two variables on another unobserved factor |
|
|
Term
|
Definition
| a common measure of spread in the distribution of a random sample |
|
|
Term
|
Definition
| generically, an estimate of the standard deviation of an estimator |
|
|
Term
|
Definition
| the act of testing hypotheses about population parameters |
|
|
Term
|
Definition
| the importance of an estimate as measured by the size of a test statistic, usually a t statistic |
|
|
Term
| sum of squared residuals (SSR) |
|
Definition
| in multiple regression analysis, the sum of the squared OLS residuals across all observations |
|
|
Term
|
Definition
| the distribution of the ratio of a standard normal random variable and the square root of an independent chi-square random variable is first divided by its df |
|
|
Term
|
Definition
| the statistic used to test a single hypothesis about the parameters in an econometric model |
|
|
Term
|
Definition
| data collected over time on one or more variables |
|
|
Term
| Total Sum of Squares (SST) |
|
Definition
| the total sampling variance in a dependent variable about its sampling average |
|
|
Term
|
Definition
| the actual population model relating the dependent variable to the relevant independent variables, plus a disturbance, where the zero conditional mean assumption holds |
|
|
Term
|
Definition
| an alternative where the population parameter can either be less than or greater that the value stated under the null hypothesis |
|
|
Term
|
Definition
| a test against a two-sided alternative |
|
|
Term
|
Definition
| a rejection of the null hypothesis when it is true |
|
|
Term
|
Definition
| the failure to reject the null hypothesis when it is false |
|
|
Term
|
Definition
| when a null hypothesis is rejected in favor of a one-tailed alternative hypothesis but the “statistics” has the opposite sign of what the alternative hypothesis is claiming. |
|
|
Term
|
Definition
| an estimator whose expected value (or mean of its sampling distribution)equals the population value (regardless of the population value) |
|
|
Term
|
Definition
| a measure of spread in the distribution of a random variable |
|
|
Term
| Weighted Least Squares (WLS) Estimator |
|
Definition
| a estimator used to adjust for a known form of heteroskedasticity, where each squared residual is weighted by the inverse of the (estimated) variance of the error |
|
|
Term
|
Definition
| A test for heteroskedasticity that involves regressing the squared OLS residuals on the OLS fitted values and on the squares of the fitted values; in its most general form, the squared OLS residuals are regressed on the explanatory variables, the squares of the explanatory variables, and all the nonredundant interactions of the explanatory variables |
|
|
Term
|
Definition
| of two estimators, one is more efficient than the other if it has a smaller variance |
|
|
Term
|
Definition
| y is related to x and the error term in a linear function |
|
|
Term
|
Definition
| allows us to predict y from x |
|
|
Term
|
Definition
|
|
Term
|
Definition
| random sampling size of n (equal selection chances across population) |
|
|
Term
|
Definition
| leads to measurement based statistics that approximate value of parameters |
|
|
Term
|
Definition
|
|
Term
|
Definition
1) none of the IV are constant 2) no exact linear relation among IV |
|
|
Term
|
Definition
| lets us tell what explanatory variable is having what effect |
|
|
Term
|
Definition
| the slope and intercept estimates are not defined |
|
|
Term
|
Definition
| u has an expected value of 0 with any value of the independent variable |
|
|
Term
|
Definition
| allows us to derive statistical properties as conditional of the values of x in a sample |
|
|
Term
|
Definition
| likely omitted an important variable and so the explanatory power suffers. this is an example of bias due to misspecification |
|
|
Term
|
Definition
| u has the same (constant) variance given any value of the independent variable |
|
|
Term
|
Definition
| needed to justify t tests, F tests, and confidence intervals |
|
|
Term
|
Definition
| heteroskedasticity (non constant variance) affects efficiency |
|
|
Term
|
Definition
| u is independent of the explanatory variables and is normally distributed with a mean of zero |
|
|
Term
|
Definition
| t hat makes statistical inference possible |
|
|
Term
|
Definition
| creates problems with confidence intervals and significance tests because they are based on assumptions of normally distributed errors |
|
|
Term
| difference between efficiency, consistency, and unbiased |
|
Definition
1) efficiency: of two estimators, one is more efficient than the other if it has a smaller variance 2) consistency: as sample size increases, the variance with sample size of interest, the slope gets close to the true variance 3)bias: the difference between the expected value of an estimator and the true population |
|
|
Term
|
Definition
| Type III error occurs when a null hypothesis is rejected in favor of a one-tailed alternative hypothesis but the “statistics” has the opposite sign of what the alternative hypothesis is claiming. |
|
|
Term
| What is the difference between an F test and R squared? |
|
Definition
| An F stat is the only real measure of goodness of fit, as the R-squared only accounts for the variance |
|
|
Term
| Compare errors (measurement error, individual error, random error, population variance, sample variance, standard deviation, standard error, residual error, type I, type II, type III). |
|
Definition
1)individual error is the diff. b/w expected value and individual observed value 2)random error: error due to random variability in individual observation (part of individual error) 3)population Variance (estimate) is the sum of square of the amount by which the observed values deviate from the mean divided by the number in the population. 4)Sample Variance: is the sum of square of the amount by which the observed values deviate from the mean (individual error) divided by the number of comparisons (sample number -1) (By taking the square of the individual error the negative signs disappear.)
5)Standard Deviation: This is a standardized unit of error which accounts for both the magnitude of the observed values and the number of observations in the sample. It is found by taking the square root of the sample variance. In a normally distributed sample one standard deviation will equal approximately 68% of the area under the distribution curve. Or there is a 68% chance of finding a value within one standard deviation of the mean. 6) Standard error: this is a more narrow average about the mean than the standard deviation. It is calculated as the Standard Deviation divided by the square root of the number of observations. As the number of observations increase the estimation of the sample mean is closer to the “true mean” or the population mean. This is the error reported when giving mean values. This is also the error term in determining the t statistic for hypothesis testing and the 95% confidancee interval. 7) Residual error: In regression analysis this is the variation in the dependant variable not explained by the variation in the dependant variables. This is found by finding the expected values of y given the regression model and the observed values of x. The residual error is the difference between the observed values and the calculated or fitted values.
8) There are also errors associated with hypothesis testing: Type I error which is the error of rejecting a null hypothesis when it is really true. Type II error which is to accept the null hypothesis when it is rally false and Type III error which is the rejection of the null hypothesis is correct but the acceptance of a one tailed alternative is incorrect. |
|
|
Term
| What are dummy (binary), proxy, quadratic, interaction, and natural logarithms and when are they used? |
|
Definition
1) dummy variable are coded as 0 or 1 and used to include qualitative data 2)proxy variables are used when the needed variable can't be measured so you find something similar to replace it 3) quadratic terms are used when the data has a turning point 4) interaction terms are used when the effect of one variable is dependent on the partial effect of another variable 5) logarithms are used when we need to bring numbers down and they reduce variability (White Noise). they only work for positive numbers. |
|
|
Term
| How do we read a p-value in the absence of critical values? |
|
Definition
| "the probability of committing a Type I error if we reject the null hypothesis that _____ is 4.8% (p value: 0.048)." |
|
|
Term
| Five steps for checking a model |
|
Definition
| 1) F stat 2) P value associated with F stat 3) R squared 4) signs on coefficients 5) individual significance |
|
|
Term
| What is the difference between reliability and validity? |
|
Definition
1) reliability (consistency): the degree to which measures yield the same result when applied under the same circumstances 2)validity: the effectiveness of measuring instruments in the extent that the instruments measures the phenomenon one wants to study |
|
|
Term
|
Definition
| the dummy variable with a value of zero |
|
|
Term
|
Definition
| test for functional form misspecification |
|
|
Term
|
Definition
|
|
Term
|
Definition
| null hypothesis is true, but we reject it |
|
|
Term
|
Definition
| null hypothesis is false, but we don't reject it |
|
|
Term
|
Definition
| statistically significant effect, but does not follow the hypothesis |
|
|
Term
|
Definition
| p value associated with F used to test the null hypothesis that all the model coefficients are zero |
|
|
Term
|
Definition
| the variance of the unobservable error, u, conditional on the explanatory variable, is not constant |
|
|
Term
|
Definition
| the value of of y when x equals 0 |
|
|
Term
|
Definition
| variable that is the product of 2 variables |
|
|
Term
| What are the five steps for checking a model? |
|
Definition
| 1. look at f-statistic 2. look at p-value associated with f statistic 3. see how much variation is explained by the model (r-squared) 4. look at coefficients for correct signs 5. are variables significant |
|
|
Term
| Explain confidence interval. |
|
Definition
| we are 95% sure that the true value of the coefficient in the model generated this data falls within this value |
|
|
Term
| How do we read p> absolute value of t=0.000 or p> absolute value = 0.048? |
|
Definition
| the probability of committing a Type 1 error if we reject the null hypothesis (the the slope coefficient is zero) is zero is 1000 (4.8% type 1 error) |
|
|
Term
|
Definition
| the % of our samples in which we want our confidence interval to contain the population value |
|
|
Term
|
Definition
| as a sample size increases, the variance with sample size of interest, the slope gets closer to the true variance |
|
|
Term
|
Definition
| the degree to which measures yield the same results when applied under the same circumstances |
|
|
Term
|
Definition
| the effectiveness of the measuring instrument in the extent that the instrument measures the phenomenon one wants to study |
|
|