Shared Flashcard Set

Details

Regression with Categorical Dependent Variables
for Social Sciences
20
Sociology
Graduate
01/04/2010

Additional Sociology Flashcards

 


 

Cards

Term
transformational (or statistical) approach
Definition
categorical data as inherently categorical. Thus, the analytical approach is to transform the dependent variable so that regression techniques can be applied.
Term
latent variable (or econometric) approach
Definition
most commonly used in economics and psychometrics—sees categorical data as conceptual continuous but measured as categorical. For example, a dichotomous measure of gender as male/female is a latent variable because it actually captures a range of masculine and feminine qualities.
Term
Bernoulli trials
Definition
With logit models, the purposes are to classify cases into groups and then check to see how well we did. In statistical terms, the attempts at classification are called Bernoulli trials, and if the classification with information is no better than the classification without information, then the information used does not significantly predict the outcome (eating breakfast this morning versus not).
Term
Probability
Definition
the chance of success. Mathematically, it is the number of times a success occurs divided by the number of times that it could occur. If, in a population of ten people, five of them vote, then the probability of voting is 5/10 or .5.
Term
odds
Definition
the probability of success divided by the probability of failure. If the probability of voting is .5, then the probability of not voting is also .5. Thus, the odds are 1.
Term
Odds ratios
Definition
the ratio of two odds. Suppose we have the voting patterns for both men and women. If 75% of women vote and 65% of men vote, the odds for women voting is .75/.25=3 and the odds for men voting is .6/.4=1.5. The odds ratio is then 3/1.5=2 meaning that the odds are 2 to 1 that women will vote compared to men.
Term
Log odds
Definition
(also known as logits) are the natural logarithm of the odds.
Term
iterations
Definition
are required because logistic regression is based on a maximum likelihood function, so Stata is attempting calculations to see where it obtains an overall likelihood of estimating the observed data greater than with the previous calculation. I liken the process to standing on a hill blindfolded and trying to reach the top. To do this, I put out my foot and find the place where a step takes me in an upward direction. I know I’ve reached the top when a step in any direction takes me downhill. In logit, at iteration zero, you are fitting the model with only the constant (i.e., no information). The last iteration occurs when the addition of more information does not improve the model.
Term
convergence
Definition
reached in the last iteration
Term
likelihood ratio chi-squared test
Definition
determines if the model you’ve created is significantly better than a model with no independent variables (i.e., random assignment of outcomes).
Term
overall chi-squared
Definition
In logistic regression parlance, the likelihood ratio chi-squared test
Term
z-test
Definition
An examination of the z-test shows that education is useful in classifying respondents as voters or non-voters (p<.001). It is important to note, however, that a z-test does not have the same level of validity as a t-test in OLS regression. In a bivariate OLS regression model, the t-test for the variable and the F-test for the overall model would yield the same results. In bivariate logit, the overall chi-squared and the z-test can yield conflicting results. For example, the overall chi-squared test may tell you that X is a significant predictor of Y, while the z-test may tell you that it is not. In such cases, the overall chi-squared test should prevail.
Term
Pseudo R2
Definition
a value similar to R2 in that it ranges from 0 to 1 and gives an indication of the goodness of fit of the model. There are many different ways that Pseudo R2 can be calculated, but the calculation shown by Stata is McFadden’s Pseudo R2. Unlike the coefficient of determination, Pseudo R2 is not a PRE measure; thus, it cannot tell you the percent of variance explained in the model. Instead, Pseudo R2 is more similar to tau-b in that values closer to 0 indicate poor fit, while values closer to 1 indicate good fit.
Term
odds ratios (in regression and basic rules)
Definition
. To obtain more usable information, we must convert the regression coefficients (known as logits) into odds ratios. The conversion requires raising the natural log base e to the power of the logit. Odds ratios can be interpreted as the number of times more or less likely a case is to succeed rather than fail than another case. 1. An odds ratio of less than 1 indicates a negative relationship.
2. An odds ratio equal to 1 indicates no relationship.
3. An odds ratio greater than 1 indicates a positive relationship.
Term
convert my odds ratios to percentages
Definition
If the odds ratio is larger than 1, the conversion is to subtract 1 from the odds ratio and multiply by 100. Thus, the odds ratio of 1.25 can be converted to 25 percent (1.25 – 1 x 100). This means that respondents with a given level of education are 25 percent more likely to vote than respondents with the next lower level of education.
If the odds ratio is less than 1, the conversion is to subtract the odds ratio from 1 and multiply by 100. Thus, if the odds ratio was .25, this would result in a conversion to 75 percent (1-.25 x 100). Thus those at a given level of X are 75 percent less likely to succeed than those in the next lowest level of X.
Term
Polytomous variables
Definition
ordinal or nominal variables that are not binary
Term
likelihood ratio chi-squared test
Definition
For multivariate logistic regression, In this test, we determine if the -2 log likelihood for the full model (the one with the controls) is better than for a reduced model. The -2 log likelihood is the value produced at the final maximum likelihood iteration, multiplied by -2. In your Stata output, the -2 log likelihood is recorded as “log likelihood” in the results window and is equal to -1518.74. The -2 log likelihood is the maximum likelihood of falling into the classifications logged and multiplied by -2. The mathematical transformation is necessary in order to produce a number large enough to see (instead of just a decimal followed by a lot of zeros).
Term
Ordered logit
Definition
used when the dependent variable is ordinal and polytomous.
Term
Multinomial logistic regression
Definition
used when the dependent variable is polytomous and not ordered, such as well the outcome is Democrat, Republican, and Independent.
Term
relative risk ratios
Definition
The coefficients generated in the multinomial regression model cannot be converted to odds ratios using a simple subcommand. However, the coefficients can be converted to relative risk ratios that indicate the probability of an outcome for one group over another. Always keep in mind that relative risk ratios, like odds ratios, indicate a negative relationship if values are less than 1 and positive relationship if values are greater than 1.
Supporting users have an ad free experience!