Term
What is Type I error (False Positive) 

Definition
Type I errors is less risk to a program classification since it classifies a module as high risk when its actually low risk 


Term
what is Type II error (false negative) 

Definition
Type II is more important because it is when a module is classified as a low risk module when its actually a high risk module 


Term
Similarities between logistic regression and multiple linear regression. 

Definition
The equations are very similar on the right hand side. 


Term
Differences between logistic regression and multiple linear regression. 

Definition
Multiple linear regression is used to predict a value. Logistic regression is used to predict a class. 


Term

Definition
Split the data into two parts, one being training and other being test. 


Term

Definition
You build a model using training data set or fit data set, then you use that same fit or training data set to substitute back into that model and then you look at the accuracy and other performance 


Term

Definition
you have software projects and subsequent software projects. Then you use one software project to build the model then you use the other software projects as a test. 


Term

Definition
split the data into 10 parts and we use 9 parts to build the model and then the last part to test the model then you repeat this 10 times and then you combine the results 


Term

Definition
The proportion of the modules that are actually fault prone(correctly guessed) out of all of them modules that were classified as fault prone 


Term

Definition
the ratio of the number of correctly guessed fault prone modules you predicted divided by the total number of modules that are actually fault prone modules 


Term
For linear regression models among the selection methods: greedy,M5,no selection. which method will used most number of independent variables 

Definition
no selection because none of the variables are pruned. 


Term
an overfitted number prediction or classification model 

Definition
The model is very good when you use the training data but very bad when you use the test data. (model too good to be true) 


Term

Definition
class (xi) = {NFP if (NFP/FP) > c {FP, Otherwise
mean value of 0.75 :means 75% NFP and 25% are FP (.75)/(.25) = 3
(a) 2>3 = NFP (b) 5<3 = FP 


Term

Definition
(A) K ? ? 20 ^ between C and E # of faults C=0 E=1; (0+1)/2 = 0.5 L ? ? 50 ^ between I and J # of faults I=7 J=10; (7+10)/2 = 8.5 M ? ? 38 ^ between H and I # of faults H=5 I=7; (5+7)/2 = 6
(B) for dfp find 2 closes to fp for dnfp find 2 closest to nfp Class (xi){NFP if (dfp/dnfp) > c {FP otherwise K ? ? 20 dfp 35 and 40; ((3520+(4020))/2 =17.5 dnnfp 21 and 21; ((2120)+(2220))/2 =1.5 dfp/dnfp; 17.5/1.5 > 0.5 so NFP L ? ? 50 dfp 40 and 55; (4050+(5550))/2 =7.5 dnnfp 30 and 29; (2950+3050)/2 =20.5 dfp/dnfp; 7.5/20.5 !> 0.5 so FP M ? ? 38 dfp 35 and 40; ((3520)+(4020))/2 =17.5 dnnfp 30 and 29; (2950+3050)/2 =20.5 dfp/dnfp; 17.5/20.5 !> 0.5 so NFP 


Term

Definition
1. Use numerical prediction to predict faults 2. Order the modules based on the predicted faults. 3. Finding the quality by using the actual number of faults 

