# Shared Flashcard Set

## Details

Data Mining
n/a
15
Computer Science
11/27/2011

Term
 What is Type I error (False Positive)
Definition
 Type I errors is less risk to a program classification since it classifies a module as high risk when its actually low risk
Term
 what is Type II error (false negative)
Definition
 Type II is more important because it is when a module is classified as a low risk module when its actually a high risk module
Term
 Similarities between logistic regression and multiple linear regression.
Definition
 The equations are very similar on the right hand side.
Term
 Differences between logistic regression and multiple linear regression.
Definition
 Multiple linear regression is used to predict a value. Logistic regression is used to predict a class.
Term
 Data splitting
Definition
 Split the data into two parts, one being training and other being test.
Term
 re-substitution
Definition
 You build a model using training data set or fit data set, then you use that same fit or training data set to substitute back into that model and then you look at the accuracy and other performance
Term
 subsequent project
Definition
 you have software projects and subsequent software projects. Then you use one software project to build the model then you use the other software projects as a test.
Term
 cross-validation
Definition
 split the data into 10 parts and we use 9 parts to build the model and then the last part to test the model then you repeat this 10 times and then you combine the results
Term
 Precision (efficiency)
Definition
 The proportion of the modules that are actually fault prone(correctly guessed) out of all of them modules that were classified as fault prone
Term
 Recall (effectiveness)
Definition
 the ratio of the number of correctly guessed fault prone modules you predicted divided by the total number of modules that are actually fault prone modules
Term
 For linear regression models among the selection methods: greedy,M5,no selection. which method will used most number of independent variables
Definition
 no selection because none of the variables are pruned.
Term
 an over-fitted number prediction or classification model
Definition
 The model is very good when you use the training data but very bad when you use the test data. (model too good to be true)
Term
 #4look on doc
Definition
 class (xi) = {NFP if (NFP/FP) > c {FP, Otherwisemean value of 0.75 :means 75% NFP and 25% are FP(.75)/(.25) = 3(a) 2>3 = NFP(b) 5<3 = FP
Term
 (5)
Definition
 (A) K ? ? 20^ between C and E# of faults C=0 E=1; (0+1)/2 = 0.5 L ? ? 50^ between I and J# of faults I=7 J=10; (7+10)/2 = 8.5 M ? ? 38^ between H and I# of faults H=5 I=7; (5+7)/2 = 6(B)for dfp find 2 closes to fp for dnfp find 2 closest to nfpClass (xi){NFP if (dfp/dnfp) > c {FP otherwise K ? ? 20 dfp 35 and 40; ((35-20+(40-20))/2 =17.5 dnnfp 21 and 21; ((21-20)+(22-20))/2 =1.5 dfp/dnfp; 17.5/1.5 > 0.5 so NFP L ? ? 50 dfp 40 and 55; (|40-50|+(55-50))/2 =7.5 dnnfp 30 and 29; (|29-50|+|30-50|)/2 =20.5 dfp/dnfp; 7.5/20.5 !> 0.5 so FP M ? ? 38 dfp 35 and 40; ((35-20)+(40-20))/2 =17.5 dnnfp 30 and 29; (|29-50|+|30-50|)/2 =20.5 dfp/dnfp; 17.5/20.5 !> 0.5 so NFP
Term
 MOM steps
Definition
 1. Use numerical prediction to predict faults2. Order the modules based on the predicted faults.3. Finding the quality by using the actual number of faults
Supporting users have an ad free experience!