Ensemble Methods are methods that combine together many model predictions. For example, in Bagging (short for bootstrap aggregation), parallel models are constructed on m = many bootstrapped samples (eg., 50), and then the predictions from the m models are averaged to obtain the prediction from the ensemble of models. In this tutorial we walk through basics of three Ensemble Methods: Bagging, Random Forests, and Boosting.
In this session we cover …
library(psych) #for general functions
library(ggplot2) #for data visualization
# library(devtools)
# devtools::install_github('topepo/caret/pkg/caret') #May need the github version to correct a bug with parallelizing
library(caret) #for training and cross validation (also calls other model libaries)
## Warning: Installed Rcpp (0.12.13) different from Rcpp used to build dplyr (0.12.11).
## Please reinstall dplyr to avoid random crashes or undefined behavior.
library(rpart) #for trees
#library(rattle) # Fancy tree plot This is a difficult library to install (https://gist.github.com/zhiyzuo/a489ffdcc5da87f28f8589a55aa206dd)
library(rpart.plot) # Enhanced tree plots
library(RColorBrewer) # Color selection for fancy tree plot
library(party) # Alternative decision tree algorithm
library(partykit) # Convert rpart object to BinaryTree
library(pROC) #for ROC curves
library(ISLR) #for the Carseat Data
## Warning: package 'ISLR' was built under R version 3.4.2
Lets look at another data example … #### Reading in the CarSeats Data exploration data set. This is a simulated data set containing sales of child car seats at 400 different stores. Sales can be predicted by 10 other variables.
#loading the data
data("Carseats")
Lets have a quick look at the data file and the descriptives.
#data structure
head(Carseats,10)
## Sales CompPrice Income Advertising Population Price ShelveLoc Age
## 1 9.50 138 73 11 276 120 Bad 42
## 2 11.22 111 48 16 260 83 Good 65
## 3 10.06 113 35 10 269 80 Medium 59
## 4 7.40 117 100 4 466 97 Medium 55
## 5 4.15 141 64 3 340 128 Bad 38
## 6 10.81 124 113 13 501 72 Bad 78
## 7 6.63 115 105 0 45 108 Medium 71
## 8 11.85 136 81 15 425 120 Good 67
## 9 6.54 132 110 0 108 124 Medium 76
## 10 4.69 132 113 0 131 124 Medium 76
## Education Urban US
## 1 17 Yes Yes
## 2 10 Yes Yes
## 3 12 Yes Yes
## 4 14 Yes Yes
## 5 13 Yes No
## 6 16 No Yes
## 7 15 Yes No
## 8 10 Yes Yes
## 9 10 No No
## 10 17 No Yes
Our outcome of interest will be a binary version of Sales
: Unit sales (in thousands) at each location.
(Note again that there is no id
variable. This is convenient for some tasks.)
Descriptives
#sample descriptives
describe(Carseats)
## vars n mean sd median trimmed mad min max range
## Sales 1 400 7.50 2.82 7.49 7.43 2.87 0 16.27 16.27
## CompPrice 2 400 124.97 15.33 125.00 125.04 14.83 77 175.00 98.00
## Income 3 400 68.66 27.99 69.00 68.26 35.58 21 120.00 99.00
## Advertising 4 400 6.63 6.65 5.00 5.89 7.41 0 29.00 29.00
## Population 5 400 264.84 147.38 272.00 265.56 191.26 10 509.00 499.00
## Price 6 400 115.80 23.68 117.00 115.92 22.24 24 191.00 167.00
## ShelveLoc* 7 400 2.31 0.83 3.00 2.38 0.00 1 3.00 2.00
## Age 8 400 53.32 16.20 54.50 53.48 20.02 25 80.00 55.00
## Education 9 400 13.90 2.62 14.00 13.88 2.97 10 18.00 8.00
## Urban* 10 400 1.70 0.46 2.00 1.76 0.00 1 2.00 1.00
## US* 11 400 1.64 0.48 2.00 1.68 0.00 1 2.00 1.00
## skew kurtosis se
## Sales 0.18 -0.11 0.14
## CompPrice -0.04 0.01 0.77
## Income 0.05 -1.10 1.40
## Advertising 0.63 -0.57 0.33
## Population -0.05 -1.21 7.37
## Price -0.12 0.41 1.18
## ShelveLoc* -0.62 -1.28 0.04
## Age -0.08 -1.14 0.81
## Education 0.04 -1.31 0.13
## Urban* -0.90 -1.20 0.02
## US* -0.60 -1.64 0.02
#histogram of outcome
ggplot(data=Carseats, aes(x=Sales)) +
geom_histogram(binwidth=1, boundary=.5, fill="white", color="black") +
geom_vline(xintercept = 8, color="red", size=2) +
labs(x = "Sales")
For convenience of didactic illustration we create a new variable HighSales
that is binary, “No” if Sales <= 8, and “Yes” otherwise.
#creating new binary variable
Carseats$HighSales=ifelse(Carseats$Sales<=8,"No","Yes")
Some Data cleanup
#remove old variable
Carseats$Sales <- NULL
#convert a factor variable into a numeric variable
Carseats$ShelveLoc <- as.numeric(Carseats$ShelveLoc)
We split the data - half for Training, half for Testing
#random sample half the rows
halfsample = sample(dim(Carseats)[1], dim(Carseats)[1]/2) # half of sample
#create training and test data sets
Carseats.train = Carseats[halfsample, ]
Carseats.test = Carseats[-halfsample, ]
We will use these to evaluate a variety of different classification algorithms: Random Forests, CForests,
First, we set up the cross validation control
#Setting the random seed for replication
set.seed(1234)
#setting up cross-validation
cvcontrol <- trainControl(method="repeatedcv", number = 10,
allowParallel=TRUE)
We first optimize fit of a classification tree. Our objective with the cross-validation is to optmize the size of the tree - tuning the complexity parameter.
train.tree <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="ctree",
trControl=cvcontrol,
tuneLength = 10)
train.tree
## Conditional Inference Tree
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mincriterion Accuracy Kappa
## 0.0100000 0.570 0.11907115
## 0.1188889 0.570 0.11907115
## 0.2277778 0.560 0.09628222
## 0.3366667 0.560 0.09758445
## 0.4455556 0.570 0.11934915
## 0.5544444 0.570 0.11934915
## 0.6633333 0.580 0.14348169
## 0.7722222 0.585 0.15642361
## 0.8811111 0.600 0.19649796
## 0.9900000 0.560 0.09070466
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mincriterion = 0.8811111.
plot(train.tree)
We see how the accruacy is maximized at a relatively less complex tree.
Look at the final tree
# plot tree
plot(train.tree$finalModel,
main="Regression Tree for Carseat High Sales")
To evalaute the accuracy of the tree we can look at the confusion matrix for the Training data.
#obtaining class predictions
tree.classTrain <- predict(train.tree,
type="raw")
head(tree.classTrain)
## [1] Yes No No Yes Yes Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.train$HighSales,tree.classTrain)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 62 48
## Yes 8 82
##
## Accuracy : 0.72
## 95% CI : (0.6523, 0.781)
## No Information Rate : 0.65
## P-Value [Acc > NIR] : 0.02131
##
## Kappa : 0.4563
## Mcnemar's Test P-Value : 1.872e-07
##
## Sensitivity : 0.8857
## Specificity : 0.6308
## Pos Pred Value : 0.5636
## Neg Pred Value : 0.9111
## Prevalence : 0.3500
## Detection Rate : 0.3100
## Detection Prevalence : 0.5500
## Balanced Accuracy : 0.7582
##
## 'Positive' Class : No
##
Some Errors. But the model was learned.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
tree.classTest <- predict(train.tree,
newdata = Carseats.test,
type="raw")
head(tree.classTest)
## [1] Yes No Yes Yes Yes Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.test$HighSales,tree.classTest)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 63 63
## Yes 14 60
##
## Accuracy : 0.615
## 95% CI : (0.5438, 0.6828)
## No Information Rate : 0.615
## P-Value [Acc > NIR] : 0.5312
##
## Kappa : 0.2734
## Mcnemar's Test P-Value : 4.498e-08
##
## Sensitivity : 0.8182
## Specificity : 0.4878
## Pos Pred Value : 0.5000
## Neg Pred Value : 0.8108
## Prevalence : 0.3850
## Detection Rate : 0.3150
## Detection Prevalence : 0.6300
## Balanced Accuracy : 0.6530
##
## 'Positive' Class : No
##
Accuracy of 0.71
When evaluating classification models, a few other functions may be useful. For example, the pROC
package provides convenience for calculating confusion matrices, the associcated measures of sensitivity and specificity, and for obtaining and plotting ROC curves. We can also look at the ROC curve by extracting probabilites of “Yes”.
#Obtaining predicted probabilites for Test data
tree.probs=predict(train.tree,
newdata=Carseats.test,
type="prob")
head(tree.probs)
## No Yes
## 1 0.4473684 0.5526316
## 2 0.8709677 0.1290323
## 3 0.2962963 0.7037037
## 4 0.2962963 0.7037037
## 5 0.2962963 0.7037037
## 6 0.4473684 0.5526316
#Calculate ROC curve
rocCurve.tree <- roc(Carseats.test$HighSales,tree.probs[,"Yes"])
#plot the ROC curve
plot(rocCurve.tree,col=c(4))
#calculate the area under curve (bigger is better)
auc(rocCurve.tree)
## Area under the curve: 0.6714
Training the model using treebag
We first optimize fit of a classification tree. Our objective with the cross-validation is to optmize the size of the tree - tuning the complexity parameter.
#Fix data file for use in bag() function
# Carseats2 <- Carseats.train
# Carseats2$Urban <- as.factor(Carseats2$Urban)
# Carseats2$US <- as.factor(Carseats2$US)
# Carseats2$HighSales <- as.factor(Carseats2$HighSales)
#
# train.bagg <- bag(Carseats2[,-11],Carseats2[,11], B = 10
# ,
# bagControl = bagControl(fit = ctreeBag$fit,
# predict = ctreeBag$pred,
# aggregate = ctreeBag$aggregate))
#Using treebag
train.bagg <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="treebag",
trControl=cvcontrol,
importance=TRUE)
train.bagg
## Bagged CART
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results:
##
## Accuracy Kappa
## 0.75 0.4963593
plot(varImp(train.bagg))
Not yet sure how to parse mode details from the output in order to look at the collection of trees.
Look at the collection of final trees
To evalaute the accuracy of the Bagged Trees we can look at the confusion matrix for the Training data.
#obtaining class predictions
bagg.classTrain <- predict(train.bagg,
type="raw")
head(bagg.classTrain)
## [1] No No No Yes Yes Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.train$HighSales,bagg.classTrain)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 110 0
## Yes 0 90
##
## Accuracy : 1
## 95% CI : (0.9817, 1)
## No Information Rate : 0.55
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.00
## Specificity : 1.00
## Pos Pred Value : 1.00
## Neg Pred Value : 1.00
## Prevalence : 0.55
## Detection Rate : 0.55
## Detection Prevalence : 0.55
## Balanced Accuracy : 1.00
##
## 'Positive' Class : No
##
The accuracy is perfect!
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
bagg.classTest <- predict(train.bagg,
newdata = Carseats.test,
type="raw")
head(bagg.classTest)
## [1] No No No No No Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.test$HighSales,bagg.classTest)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 107 19
## Yes 16 58
##
## Accuracy : 0.825
## 95% CI : (0.7651, 0.875)
## No Information Rate : 0.615
## P-Value [Acc > NIR] : 9.519e-11
##
## Kappa : 0.6277
## Mcnemar's Test P-Value : 0.7353
##
## Sensitivity : 0.8699
## Specificity : 0.7532
## Pos Pred Value : 0.8492
## Neg Pred Value : 0.7838
## Prevalence : 0.6150
## Detection Rate : 0.5350
## Detection Prevalence : 0.6300
## Balanced Accuracy : 0.8116
##
## 'Positive' Class : No
##
Accuracy of 0.76
We can also look at the ROC curve by extracting probabilites of “Yes”.
#Obtaining predicted probabilites for Test data
bagg.probs=predict(train.bagg,
newdata=Carseats.test,
type="prob")
head(bagg.probs)
## No Yes
## 1 0.96 0.04
## 2 0.60 0.40
## 3 0.96 0.04
## 4 0.52 0.48
## 5 0.72 0.28
## 6 0.04 0.96
#Calculate ROC curve
rocCurve.bagg <- roc(Carseats.test$HighSales,bagg.probs[,"Yes"])
#plot the ROC curve
plot(rocCurve.bagg,col=c(6))
#calculate the area under curve (bigger is better)
auc(rocCurve.bagg)
## Area under the curve: 0.8904
Training the model using random forest
train.rf <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="rf",
trControl=cvcontrol,
#tuneLength = 3,
importance=TRUE)
train.rf
## Random Forest
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.775 0.5397404
## 6 0.755 0.5059471
## 10 0.775 0.5441237
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
We can look at the confusion matrix for the Training data.
#obtaining class predictions
rf.classTrain <- predict(train.rf,
type="raw")
head(rf.classTrain)
## [1] No No No Yes Yes Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.train$HighSales,rf.classTrain)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 110 0
## Yes 0 90
##
## Accuracy : 1
## 95% CI : (0.9817, 1)
## No Information Rate : 0.55
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.00
## Specificity : 1.00
## Pos Pred Value : 1.00
## Neg Pred Value : 1.00
## Prevalence : 0.55
## Detection Rate : 0.55
## Detection Prevalence : 0.55
## Balanced Accuracy : 1.00
##
## 'Positive' Class : No
##
No Errors. That is good - the model was learned well.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
rf.classTest <- predict(train.rf,
newdata = Carseats.test,
type="raw")
head(rf.classTest)
## [1] No No No No No Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.test$HighSales,rf.classTest)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 116 10
## Yes 22 52
##
## Accuracy : 0.84
## 95% CI : (0.7817, 0.8879)
## No Information Rate : 0.69
## P-Value [Acc > NIR] : 9.004e-07
##
## Kappa : 0.6449
## Mcnemar's Test P-Value : 0.05183
##
## Sensitivity : 0.8406
## Specificity : 0.8387
## Pos Pred Value : 0.9206
## Neg Pred Value : 0.7027
## Prevalence : 0.6900
## Detection Rate : 0.5800
## Detection Prevalence : 0.6300
## Balanced Accuracy : 0.8396
##
## 'Positive' Class : No
##
Accuracy of 0.78. An improvement over Bagging only
We can also look at the ROC curve by extracting probabilites of “Yes”.
#Obtaining predicted probabilites for Test data
rf.probs=predict(train.rf,
newdata=Carseats.test,
type="prob")
head(rf.probs)
## No Yes
## 1 0.686 0.314
## 4 0.588 0.412
## 5 0.762 0.238
## 9 0.570 0.430
## 10 0.646 0.354
## 18 0.298 0.702
#Calculate ROC curve
rocCurve.rf <- roc(Carseats.test$HighSales,rf.probs[,"Yes"])
#plot the ROC curve
plot(rocCurve.rf,col=c(1))
#calculate the area under curve (bigger is better)
auc(rocCurve.rf)
## Area under the curve: 0.9021
An implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners (from party
package)
train.cf <- train(HighSales ~ ., #cforest knows the outcome is binary (unlike rf)
data=Carseats.train,
method="cforest",
trControl=cvcontrol) #Note that importance not available here
train.cf
## Conditional Inference Random Forest
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.645 0.2429219
## 6 0.735 0.4504639
## 10 0.705 0.3909498
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 6.
We can look at the confusion matrix for the Training data.
#obtaining class predictions
cf.classTrain <- predict(train.cf,
type="raw")
head(cf.classTrain)
## [1] No No No Yes Yes Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.train$HighSales,cf.classTrain)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 106 4
## Yes 11 79
##
## Accuracy : 0.925
## 95% CI : (0.8793, 0.9574)
## No Information Rate : 0.585
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8474
## Mcnemar's Test P-Value : 0.1213
##
## Sensitivity : 0.9060
## Specificity : 0.9518
## Pos Pred Value : 0.9636
## Neg Pred Value : 0.8778
## Prevalence : 0.5850
## Detection Rate : 0.5300
## Detection Prevalence : 0.5500
## Balanced Accuracy : 0.9289
##
## 'Positive' Class : No
##
A few Errors. Model learned pretty well.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
cf.classTest <- predict(train.cf,
newdata = Carseats.test,
type="raw")
head(cf.classTest)
## [1] No No No No No Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.test$HighSales,cf.classTest)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 119 7
## Yes 22 52
##
## Accuracy : 0.855
## 95% CI : (0.7984, 0.9007)
## No Information Rate : 0.705
## P-Value [Acc > NIR] : 5.477e-07
##
## Kappa : 0.6754
## Mcnemar's Test P-Value : 0.00933
##
## Sensitivity : 0.8440
## Specificity : 0.8814
## Pos Pred Value : 0.9444
## Neg Pred Value : 0.7027
## Prevalence : 0.7050
## Detection Rate : 0.5950
## Detection Prevalence : 0.6300
## Balanced Accuracy : 0.8627
##
## 'Positive' Class : No
##
Accuracy of 0.715
We can also look at the ROC curve by extracting probabilites of “Yes”.
#Obtaining predicted probabilites for Test data
cf.probs=predict(train.cf,
newdata=Carseats.test,
type="prob")
head(cf.probs)
## No Yes
## 1 0.5551222 0.4448778
## 2 0.6379772 0.3620228
## 3 0.7206398 0.2793602
## 4 0.5318676 0.4681324
## 5 0.5603060 0.4396940
## 6 0.3079523 0.6920477
#Calculate ROC curve
rocCurve.cf <- roc(Carseats.test$HighSales,cf.probs[,"Yes"])
#plot the ROC curve
plot(rocCurve.cf,col=c(2))
#calculate the area under curve (bigger is better)
auc(rocCurve.cf)
## Area under the curve: 0.9299
Possible ot use a variety of packages: “gbm”, “ada”, and “xgbLinear” – all can be accessed through caret. Can lookup the various tuning parmaters
modelLookup("ada")
## model parameter label forReg forClass probModel
## 1 ada iter #Trees FALSE TRUE TRUE
## 2 ada maxdepth Max Tree Depth FALSE TRUE TRUE
## 3 ada nu Learning Rate FALSE TRUE TRUE
modelLookup("gbm")
## model parameter label forReg forClass
## 1 gbm n.trees # Boosting Iterations TRUE TRUE
## 2 gbm interaction.depth Max Tree Depth TRUE TRUE
## 3 gbm shrinkage Shrinkage TRUE TRUE
## 4 gbm n.minobsinnode Min. Terminal Node Size TRUE TRUE
## probModel
## 1 TRUE
## 2 TRUE
## 3 TRUE
## 4 TRUE
Here, we use Gradient Boosting Example tuning parameters for “gbm: http://topepo.github.io/caret/training.html
Training with gradient boosting
train.gbm <- train(as.factor(HighSales) ~ .,
data=Carseats.train,
method="gbm",
verbose=F,
trControl=cvcontrol)
train.gbm
## Stochastic Gradient Boosting
##
## 200 samples
## 10 predictor
## 2 classes: 'No', 'Yes'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 1 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees Accuracy Kappa
## 1 50 0.730 0.4511090
## 1 100 0.785 0.5654964
## 1 150 0.805 0.6066137
## 2 50 0.790 0.5702364
## 2 100 0.820 0.6342360
## 2 150 0.795 0.5864473
## 3 50 0.785 0.5648166
## 3 100 0.795 0.5864562
## 3 150 0.810 0.6174929
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were n.trees = 100,
## interaction.depth = 2, shrinkage = 0.1 and n.minobsinnode = 10.
We can look at the confusion matrix for the Training data.
#obtaining class predictions
gbm.classTrain <- predict(train.gbm,
type="raw")
head(gbm.classTrain)
## [1] No No No Yes Yes Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.train$HighSales,gbm.classTrain)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 105 5
## Yes 6 84
##
## Accuracy : 0.945
## 95% CI : (0.9037, 0.9722)
## No Information Rate : 0.555
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8888
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9459
## Specificity : 0.9438
## Pos Pred Value : 0.9545
## Neg Pred Value : 0.9333
## Prevalence : 0.5550
## Detection Rate : 0.5250
## Detection Prevalence : 0.5500
## Balanced Accuracy : 0.9449
##
## 'Positive' Class : No
##
A few Errors. Model learned quite well.
More interesting is the confusion matrix when applied to the Test data.
#obtaining class predictions
gbm.classTest <- predict(train.gbm,
newdata = Carseats.test,
type="raw")
head(gbm.classTest)
## [1] No No No No No Yes
## Levels: No Yes
#computing confusion matrix
confusionMatrix(Carseats.test$HighSales,gbm.classTest)
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 115 11
## Yes 14 60
##
## Accuracy : 0.875
## 95% CI : (0.821, 0.9174)
## No Information Rate : 0.645
## P-Value [Acc > NIR] : 1.627e-13
##
## Kappa : 0.7296
## Mcnemar's Test P-Value : 0.6892
##
## Sensitivity : 0.8915
## Specificity : 0.8451
## Pos Pred Value : 0.9127
## Neg Pred Value : 0.8108
## Prevalence : 0.6450
## Detection Rate : 0.5750
## Detection Prevalence : 0.6300
## Balanced Accuracy : 0.8683
##
## 'Positive' Class : No
##
Accuracy of 0.83
We can also look at the ROC curve by extracting probabilites of “Yes”.
#Obtaining predicted probabilites for Test data
gbm.probs=predict(train.gbm,
newdata=Carseats.test,
type="prob")
head(gbm.probs)
## No Yes
## 1 0.70521837 0.2947816
## 2 0.56658110 0.4334189
## 3 0.85531345 0.1446865
## 4 0.67297281 0.3270272
## 5 0.73232024 0.2676798
## 6 0.04450397 0.9554960
#Calculate ROC curve
rocCurve.gbm <- roc(Carseats.test$HighSales,gbm.probs[,"Yes"])
#plot the ROC curve
plot(rocCurve.gbm, col=c(3))
#calculate the area under curve (bigger is better)
auc(rocCurve.gbm)
## Area under the curve: 0.9453
See …
https://machinelearningmastery.com/machine-learning-ensembles-with-r/
https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/
We can examine how the models do by looking at the ROC curves.
plot(rocCurve.tree,col=c(4))
plot(rocCurve.bagg,add=TRUE,col=c(6)) # color magenta is bagg
plot(rocCurve.rf,add=TRUE,col=c(1)) # color black is rf
plot(rocCurve.cf,add=TRUE,col=c(2)) # color red is cforest
plot(rocCurve.gbm,add=TRUE,col=c(3)) # color green is gbm
Tree = blue, Bagg = magenta, RF = black, CForest = red, gradient boosting = green
For this example, random forests and boosting are more stable than the other methods. Comparing the variable importance metrics to the decision tree results is a way to see how likely the tree is to generalize.
Thank you for playing!