Term
| What is Type I error (False Positive) |
|
Definition
| Type I errors is less risk to a program classification since it classifies a module as high risk when its actually low risk |
|
|
Term
| what is Type II error (false negative) |
|
Definition
| Type II is more important because it is when a module is classified as a low risk module when its actually a high risk module |
|
|
Term
| Similarities between logistic regression and multiple linear regression. |
|
Definition
| The equations are very similar on the right hand side. |
|
|
Term
| Differences between logistic regression and multiple linear regression. |
|
Definition
| Multiple linear regression is used to predict a value. Logistic regression is used to predict a class. |
|
|
Term
|
Definition
| Split the data into two parts, one being training and other being test. |
|
|
Term
|
Definition
| You build a model using training data set or fit data set, then you use that same fit or training data set to substitute back into that model and then you look at the accuracy and other performance |
|
|
Term
|
Definition
| you have software projects and subsequent software projects. Then you use one software project to build the model then you use the other software projects as a test. |
|
|
Term
|
Definition
| split the data into 10 parts and we use 9 parts to build the model and then the last part to test the model then you repeat this 10 times and then you combine the results |
|
|
Term
|
Definition
| The proportion of the modules that are actually fault prone(correctly guessed) out of all of them modules that were classified as fault prone |
|
|
Term
|
Definition
| the ratio of the number of correctly guessed fault prone modules you predicted divided by the total number of modules that are actually fault prone modules |
|
|
Term
| For linear regression models among the selection methods: greedy,M5,no selection. which method will used most number of independent variables |
|
Definition
| no selection because none of the variables are pruned. |
|
|
Term
| an over-fitted number prediction or classification model |
|
Definition
| The model is very good when you use the training data but very bad when you use the test data. (model too good to be true) |
|
|
Term
|
Definition
class (xi) = {NFP if (NFP/FP) > c {FP, Otherwise
mean value of 0.75 :means 75% NFP and 25% are FP (.75)/(.25) = 3
(a) 2>3 = NFP (b) 5<3 = FP |
|
|
Term
|
Definition
(A) K ? ? 20 ^ between C and E # of faults C=0 E=1; (0+1)/2 = 0.5 L ? ? 50 ^ between I and J # of faults I=7 J=10; (7+10)/2 = 8.5 M ? ? 38 ^ between H and I # of faults H=5 I=7; (5+7)/2 = 6
(B) for dfp find 2 closes to fp for dnfp find 2 closest to nfp Class (xi){NFP if (dfp/dnfp) > c {FP otherwise K ? ? 20 dfp 35 and 40; ((35-20+(40-20))/2 =17.5 dnnfp 21 and 21; ((21-20)+(22-20))/2 =1.5 dfp/dnfp; 17.5/1.5 > 0.5 so NFP L ? ? 50 dfp 40 and 55; (|40-50|+(55-50))/2 =7.5 dnnfp 30 and 29; (|29-50|+|30-50|)/2 =20.5 dfp/dnfp; 7.5/20.5 !> 0.5 so FP M ? ? 38 dfp 35 and 40; ((35-20)+(40-20))/2 =17.5 dnnfp 30 and 29; (|29-50|+|30-50|)/2 =20.5 dfp/dnfp; 17.5/20.5 !> 0.5 so NFP |
|
|
Term
|
Definition
1. Use numerical prediction to predict faults 2. Order the modules based on the predicted faults. 3. Finding the quality by using the actual number of faults |
|
|