class: center, middle, inverse, title-slide # Statisical Learning and Ensemble Methods ### ⚔
Mateus Maia
a joint work with Anderson Ara
Master’s Student of Statistics at Federal University of Bahia
Supported by CAPES/CNPq ### 2019-05-03 --- #Outline ####Ensemble Methods -- ####Bagging Methods - RandomForest -- ####Boosting - AdaBoosting - Gradient Boosting <!-- - xGM Boosting --> --- class: inverse, center, middle # Ensemble Models --- # Statistical Learning Models <br/> <br/> .center[] --- # Ensemble Models .center[] --- # Ensemble Models <br/> <br/> .center[] --- # Ensemble Models <br/> <br/> .center[] --- class: inverse, center, middle # Bagging --- ##.center[*Booststraping Aggregation*] ### Regression Problems `$$\hat{G}_{bag}(\mathbf{x})=\frac{1}{B}\sum_{b=1}^{B}\hat{g}^{*b}(\mathbf{x})$$` ### Classification Problems `$$\hat{G}_{bag}(\mathbf{x})=sign\left(\sum_{b=1}^{B}\hat{g}^{*b}(\mathbf{x})\right)$$` <!-- --- --> <!-- # Bootstraping --> <!-- .center[] --> <!-- --- --> <!-- # Bootstraping --> <!-- .center[] --> <!-- --- --> <!-- # Bootstraping --> <!-- .center[] --> <!-- --- --> <!-- # Bootstraping --> <!-- .center[] --> --- # Bagging: Pseudo code 1. Sample, with replacement, a *k* dataset of the same size as the whole data set ( k bootstraping samples). -- 1. Train a models to each boostrap sample. -- 1. The ensemble prediction is given by: **the mean** from all predictions from models (for regression tasks)<br/> **the majority vote** among all models (for classification tasks) --- background-image: url(forest.jpg) background-size: cover class: inverse, center, middle # Random Forest --- # Random Forest -Bagging with decision trees -Subsample the data *and* the features for each model in the ensemble -Reduces variance, few hyperparameters, easy to paralellise. -No reduction of bias --- # Bias, variance and Ensemble .center[] --- background-image: url(code_time.gif) background-size: cover class: inverse, center, middle # Code Time! --- class: inverse, center, middle # Boosting --- # Boosting Models -**Adaptative Boosting (Freund and Schapire,1999)** <br/> <br/> -- - Gradient Boosting(Friedman, 2001) <br/> <br/> -- -Stochastic Gradient Boosting (Friedman, 2002) <br/> <br/> -- -eXtreme Gradient Boosting (Chen, et. al, 2016) <br/> <br/> -- -LightBoosting (Ke, et. al, 2017) --- class: inverse, center, middle # Boosting Adaptativo ## (AdaBoosting) --- # AdaBoosting Dado `$$\mathbf{y}\in\{1,-1\}$$` <br/> <br/> `$$G(\mathbf{x})=sign \left(\sum_{m=1}^{M} \alpha_{m}g_{m}(\mathbf{x}) \right)$$` -- <br/> <br/> ##.center[The weighted wisdom of a crowd of experts] --- # AdaBoosting <br/> <br/> .center[] --- background-image: url(macaco_fazendo_conta.gif) background-size: cover class: center, middle, inverse # Algebra Time! --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> </tbody> </table> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;"> 0.125 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;"> 0.125 </td> </tr> </tbody> </table> --- # AdaBoost ```r library(rpart) library(rpart.plot) g1<-rpart(Heart_Disease~Chest_Pain+Blocked_Arteries+Patient_Weight, data=medical_care, control = rpart.control(maxdepth=1,minsplit = 2)) rpart.plot(g1) ``` <img src="xaringan_presentation_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> </tr> </tbody> </table> -- `$$\alpha_{1}=\frac{1}{2}log\left(\frac{1-\epsilon_{1}}{\epsilon_{1}} \right)$$` -- `$$\alpha_{1}=\frac{1}{2}log\left(\frac{1-\epsilon_{1}}{\epsilon_{1}} \right)=\frac{1}{2}log\left(\frac{1-0.125}{0.125} \right)=0.973$$` --- #Voting Power `$$\alpha_{1}=\frac{1}{2}log\left(\frac{1-\epsilon_{1}}{\epsilon_{1}} \right)$$` <img src="xaringan_presentation_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> <th style="text-align:center;"> New_Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> </tbody> </table> -- `$$New\;weights_{correct}=weights_{correct}\times e^{-\alpha_{1}}$$` -- `$$New\;weights_{correct}=0.125\times e^{-0.973}=0.047$$` --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> <th style="text-align:center;"> New_Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> </tbody> </table> `$$New\;weights_{correct}=weights_{correct}\times e^{-\alpha_{1}}$$` `$$New\;weights_{correct}=0.125\times e^{-0.973}=0.047$$` --- #Weighting Correct <img src="xaringan_presentation_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> <th style="text-align:center;"> New_Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> </tbody> </table> -- `$$New\;weights_{wrong}=weights_{wrong}\times e^{\alpha_{1}}$$` -- `$$New\;weights_{wrong}=0.125\times e^{0.973}=0.331$$` --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> <th style="text-align:center;"> New_Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> <td style="text-align:center;"> 0.331 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> </tbody> </table> `$$New\;weights_{wrong}=weights_{wrong}\times e^{\alpha_{1}}$$` `$$New\;weights_{wrong}=0.125\times e^{0.973}=0.331$$` --- #Weighting Wrong <img src="xaringan_presentation_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> <th style="text-align:center;"> New_Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> <td style="text-align:center;"> 0.331 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.047 </td> </tr> </tbody> </table> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g1 </th> <th style="text-align:center;"> Norm_Weights </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 205 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 210 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">No</span> </td> <td style="text-align:center;"> 0.502 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 156 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> <td style="text-align:center;"> 0.071 </td> </tr> </tbody> </table> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> </tbody> </table> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;font-weight: bold;color: white !important;background-color: #3454D1 !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> </tbody> </table> --- # AdaBoosting ```r g2<-rpart(Heart_Disease~Chest_Pain+Blocked_Arteries+Patient_Weight, data=reweighted_data, control = rpart.control(maxdepth=1,minsplit = 2)) rpart.plot(g2) ``` <img src="xaringan_presentation_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- #AdaBoosting <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Chest_Pain </th> <th style="text-align:center;"> Blocked_Arteries </th> <th style="text-align:center;"> Patient_Weight </th> <th style="text-align:center;"> Heart_Disease </th> <th style="text-align:center;"> g2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 172 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 168 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">No</span> </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 167 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 125 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: red !important;">Yes</span> </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 180 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> <span style=" color: green !important;">Yes</span> </td> </tr> </tbody> </table> -- `$$\alpha_{2}=\frac{1}{2}log\left(\frac{1-\epsilon_{2}}{\epsilon_{2}} \right)$$` -- `$$\alpha_{2}=\frac{1}{2}log\left(\frac{1-\epsilon_{1}}{\epsilon_{1}} \right)=\frac{1}{2}log\left(\frac{1-0.25}{0.25} \right)=0.549$$` --- #AdaBoosting <br/> <br/> `$$G(\mathbf{x})=sign \left(\sum_{m=1}^{M} \alpha_{m}g_{m}(\mathbf{x}) \right)$$` -- <br/> <br/> `$$G(\mathbf{x_{i}})=sign \left(\alpha_{1}g_{1}(\mathbf{x}_{i})+\alpha_{2}g_{2}(\mathbf{x}_{i})+\dots+ \alpha_{M}g_{M}(\mathbf{x_{i}}) \right)$$` -- <br/> <br/> `$$G(\mathbf{x_{i}})=sign \left(1\times(\alpha_{1,Sim}+\dots+\alpha_{k,Sim})+(-1)\times \right(\alpha_{2,Não}+\dots+\alpha_{p,Não}))$$` --- class: inverse, center, middle # Dados Simulados --- # Dados Simulados .center[] --- # Dados Simulados .center[] --- # Dados Simulados .center[] --- class: center, middle, inverse # Gradient Boosting --- # Gradient Boosting Find the the function `\(\hat{g}\)` which minimizes the loss function # `$$L(\mathbf{\hat{G},y})=\sum_{n=1}^{N} L(y_{i},\hat{G}(x_{i}))$$` --- ## Gradient Descent One of the simplest optimization algorithms is called gradient descent or steepest descent. This can be written as follows, given a function of `\(f(\mathbf{\theta})\)`, the value of `\(\theta\)` that minimizes `\(f\)` can be given by -- <br/> <br/> `$$\boldsymbol{\theta_{k+1}}=\boldsymbol{\theta_{k}}-\eta_{k}\mathbf{g_{k}}=\boldsymbol{\theta_{k}}-\eta_{k}\nabla_{\theta}f(\boldsymbol{\theta_{k}})$$` -- .center[] --- # Gradient Boosting Adapting the gradient descent solution to the loss function, we have that the `\(\mathbf{\hat{G}}\)` which minimizes the loss function can be given by # `$$\mathbf{\hat{G}}_{m+1}=\mathbf{\hat{G}}_{m}-\eta_{m}\mathbf{g_{m}}$$` -- where ## `$$g_{im}=\left[ \frac{\partial L(y_{i},G(x_{i}))}{\partial G(x_{i})} \right]\Big|_{G(x_{i})=G_{m}(x_{i})}$$` --- #Gradient Boosting vs. AdaBoost .pull-left[] .pull-right[] --- class: inverse, center, middle # Regression Gradient Boosting --- # Gradient Boosting Regression <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 88 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 76 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 56 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 73 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 77 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 57 </td> </tr> </tbody> </table> --- ## First Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_0 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 88 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.17 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 76 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.17 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 56 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.17 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 73 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.17 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 77 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.17 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 57 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.17 </td> </tr> </tbody> </table> --- ## First Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_0 </th> <th style="text-align:center;"> Pseudo_Residuals </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 88 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 16.83 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 4.83 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 56 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -15.17 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.83 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 77 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 5.83 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -14.17 </td> </tr> </tbody> </table> --- ## Tree Pseudo Residuals Model ```r library(rpart) library(rpart.plot) g0<-rpart(Pseudo_Residuals~Height+Favorite_Colour+Gender, data=grad_weight, control = rpart.control(maxdepth=2,minsplit = 2)) rpart.plot(g0) ``` <img src="xaringan_presentation_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ## First Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_0 </th> <th style="text-align:center;"> Pseudo_Residuals </th> <th style="text-align:center;"> g_0 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 88 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 16.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 16.83 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 4.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 4.83 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 56 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> -15.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -14.67 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 1.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 3.83 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 77 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 5.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 3.83 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> -14.17 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -14.67 </td> </tr> </tbody> </table> --- ## First Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_0 </th> <th style="text-align:center;"> Pseudo_Residuals </th> <th style="text-align:center;"> g_0 </th> <th style="text-align:center;"> G_1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 88 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 16.83 </td> <td style="text-align:center;"> 16.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 72.853 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 4.83 </td> <td style="text-align:center;"> 4.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.653 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 56 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> -15.17 </td> <td style="text-align:center;"> -14.67 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 69.703 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 1.83 </td> <td style="text-align:center;"> 3.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.553 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 77 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> 5.83 </td> <td style="text-align:center;"> 3.83 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.553 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 71.17 </td> <td style="text-align:center;"> -14.17 </td> <td style="text-align:center;"> -14.67 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 69.703 </td> </tr> </tbody> </table> --- ## Second Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 88 </td> <td style="text-align:center;"> 72.853 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 76 </td> <td style="text-align:center;"> 71.653 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 56 </td> <td style="text-align:center;"> 69.703 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 73 </td> <td style="text-align:center;"> 71.553 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 77 </td> <td style="text-align:center;"> 71.553 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 57 </td> <td style="text-align:center;"> 69.703 </td> </tr> </tbody> </table> --- ## Second Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_1 </th> <th style="text-align:center;"> Pseudo_Residuals </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 88 </td> <td style="text-align:center;"> 72.853 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 15.147 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 71.653 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 4.347 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 56 </td> <td style="text-align:center;"> 69.703 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -13.703 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 71.553 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.447 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 77 </td> <td style="text-align:center;"> 71.553 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 5.447 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 69.703 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -12.703 </td> </tr> </tbody> </table> --- ## Tree Pseudo Residuals Model ```r library(rpart) library(rpart.plot) g1<-rpart(Pseudo_Residuals~Height+Favorite_Colour+Gender, data=new_grad_weight, control = rpart.control(maxdepth=2,minsplit = 2)) rpart.plot(g1) ``` <img src="xaringan_presentation_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- ## Second Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_1 </th> <th style="text-align:center;"> Pseudo_Residuals </th> <th style="text-align:center;"> g_1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 88 </td> <td style="text-align:center;"> 72.853 </td> <td style="text-align:center;"> 15.147 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 15.147 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 71.653 </td> <td style="text-align:center;"> 4.347 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 4.347 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 56 </td> <td style="text-align:center;"> 69.703 </td> <td style="text-align:center;"> -13.703 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -13.203 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 71.553 </td> <td style="text-align:center;"> 1.447 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 3.447 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 77 </td> <td style="text-align:center;"> 71.553 </td> <td style="text-align:center;"> 5.447 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 3.447 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 69.703 </td> <td style="text-align:center;"> -12.703 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -13.203 </td> </tr> </tbody> </table> --- ## Second Round <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Height </th> <th style="text-align:center;"> Favorite_Colour </th> <th style="text-align:center;"> Gender </th> <th style="text-align:center;"> Weight </th> <th style="text-align:center;"> G_1 </th> <th style="text-align:center;"> Pseudo_Residuals </th> <th style="text-align:center;"> g_1 </th> <th style="text-align:center;"> G_2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 88 </td> <td style="text-align:center;"> 72.853 </td> <td style="text-align:center;"> 15.147 </td> <td style="text-align:center;"> 15.147 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 74.37 </td> </tr> <tr> <td style="text-align:center;"> 1.6 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 76 </td> <td style="text-align:center;"> 71.653 </td> <td style="text-align:center;"> 4.347 </td> <td style="text-align:center;"> 4.347 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 72.09 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 56 </td> <td style="text-align:center;"> 69.703 </td> <td style="text-align:center;"> -13.703 </td> <td style="text-align:center;"> -13.203 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 68.38 </td> </tr> <tr> <td style="text-align:center;"> 1.8 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 73 </td> <td style="text-align:center;"> 71.553 </td> <td style="text-align:center;"> 1.447 </td> <td style="text-align:center;"> 3.447 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.90 </td> </tr> <tr> <td style="text-align:center;"> 1.5 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Male </td> <td style="text-align:center;"> 77 </td> <td style="text-align:center;"> 71.553 </td> <td style="text-align:center;"> 5.447 </td> <td style="text-align:center;"> 3.447 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 71.90 </td> </tr> <tr> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Female </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> 69.703 </td> <td style="text-align:center;"> -12.703 </td> <td style="text-align:center;"> -13.203 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 68.38 </td> </tr> </tbody> </table> --- #Pseudo Code __Input__: Data `\(\{(x_{i},y_{i})\}_{i=1}^{n}\)`, and a differentiable __Loss Function__ `\(L(y_{i},G(x))\)` -- __Step 1__: Initialize model with a constant values: `\(F_{0}(x)=\underset{\gamma}{argmin} \sum_{i}^NL(y_{i},\gamma)\)` -- __Step 2__: for `\(m=1\)` to `\(M\)` `\(\hspace{1cm}\)`__(a)__ Compute `\(r_{im}=-\left[ \frac{\partial L(y_{i},G(x_{i}))}{\partial G(x_{i})} \right]\Big|_{G(x_{i})=G_{m}(x_{i})}\)` for `\(i=1,..,n\)` `\(\hspace{1cm}\)`__(b)__ Fit a regression tree to the `\(r_{im}\)` values and create terminal regions `\(R_{jm}\)`,for `\(j=1,...,m\)` `\(\hspace{1cm}\)`__(c)__ For `\(j=1,\dots,J_{m}\)` comp. `\(\gamma_{i,j}=\sum_{x_{i} \in R_{ij}}\underset{\gamma}{argmin} L(y_{i},G_{m-1}(x_{i})+\gamma)\)` `\(\hspace{1cm}\)`__(d)__ Update `\(G_{m}(\mathbf{x})=G_{m-1}(\mathbf{x}) +\eta \sum_{j=1}^{J_m} \gamma_{jm} I(x \in R_{jm})\)` -- __Step 3__: Output `\(G_{M}(x)\)` --- class: inverse, center, middle # Classification Gradient Boosting --- ## Gradient Boosting Classification <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Likes_Popcorn </th> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Favorite_Color </th> <th style="text-align:center;"> Loves_Trolls_2 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 87 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> No </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> Yes </td> </tr> </tbody> </table> Calculate the log of odds `$$log(odds)=log\left(\frac{4}{2}\right)=0.7$$` Probability of __Loving Trolls 2__= `\(\frac{e^{log(odds)}}{1+e^{log(odds)}}=0.7\)` --- ## Gradient Boosting Classifiation <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Likes_Popcorn </th> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Favorite_Color </th> <th style="text-align:center;"> Loves_Trolls_2 </th> <th style="text-align:center;"> G_0 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.7 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 87 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.7 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.7 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.7 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.7 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.7 </td> </tr> </tbody> </table> --- # Gradient Boosting Regression <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Likes_Popcorn </th> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Favorite_Color </th> <th style="text-align:center;"> Loves_Trolls_2 </th> <th style="text-align:center;"> G_0 </th> <th style="text-align:center;"> Residuals </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.3 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 87 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.3 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -0.7 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -0.7 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.3 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.3 </td> </tr> </tbody> </table> --- ## Modeling Tree Residuals ```r library(rpart) library(rpart.plot) g0<-rpart(Residuals~Likes_Popcorn+Age+Favorite_Color, data=grad_trolls, control = rpart.control(maxdepth=2,minsplit = 2)) rpart.plot(g0) ``` <img src="xaringan_presentation_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> -- `$$\frac{\sum Residuals_{i}}{\sum Previous Probability_{i}\times(1-Previous Probability_{i})}$$` --- ## Gradient Boosting Classification <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Likes_Popcorn </th> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Favorite_Color </th> <th style="text-align:center;"> Loves_Trolls_2 </th> <th style="text-align:center;"> G_0 </th> <th style="text-align:center;"> Residuals </th> <th style="text-align:center;"> g_0 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.4 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 87 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -1.0 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> -0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -1.0 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> -0.7 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -3.3 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.4 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.4 </td> </tr> </tbody> </table> -- The new predict loggods wil be given by `$$G_{1}=G_{0}+\eta*g_{0}$$` --- ## Gradient Boosting Clasification Considering `\(\eta\)` = 0.8 <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Likes_Popcorn </th> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Favorite_Color </th> <th style="text-align:center;"> Loves_Trolls_2 </th> <th style="text-align:center;"> G_0 </th> <th style="text-align:center;"> Residuals </th> <th style="text-align:center;"> g_0 </th> <th style="text-align:center;"> G_1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.82 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 87 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;"> -1.0 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -0.10 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> -0.7 </td> <td style="text-align:center;"> -1.0 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -0.10 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> -0.7 </td> <td style="text-align:center;"> -3.3 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> -1.94 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.82 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 0.7 </td> <td style="text-align:center;"> 0.3 </td> <td style="text-align:center;"> 1.4 </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 1.82 </td> </tr> </tbody> </table> -- The new probabilities will be given by loggods wil be given by `\(Prob_{new1}=\frac{e^{log(odds)}}{1+e^{log(odds)}}=\frac{e^{G_1}}{1+e^{G_{1}}}\)` --- ## Gradient Boosting Clasification <table class="table table-striped" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Likes_Popcorn </th> <th style="text-align:center;"> Age </th> <th style="text-align:center;"> Favorite_Color </th> <th style="text-align:center;"> Loves_Trolls_2 </th> <th style="text-align:center;"> New_Probs </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.86 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 87 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.48 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 44 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.48 </td> </tr> <tr> <td style="text-align:center;"> Yes </td> <td style="text-align:center;"> 19 </td> <td style="text-align:center;"> Red </td> <td style="text-align:center;"> No </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.13 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> Green </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.86 </td> </tr> <tr> <td style="text-align:center;"> No </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> Blue </td> <td style="text-align:center;"> Yes </td> <td style="text-align:center;font-weight: bold;color: #243B4A !important;"> 0.86 </td> </tr> </tbody> </table> --- #Pseudo Code __Input__: Data `\(\{(x_{i},y_{i})\}_{i=1}^{n}\)`, and a differentiable __Loss Function__ `\(L(y_{i},G(x))\)` -- __Step 1__: Initialize model with a constant values: `\(F_{0}(x)=\underset{\gamma}{argmin} \sum_{i}^NL(y_{i},\gamma)\)` -- __Step 2__: for `\(m=1\)` to `\(M\)` `\(\hspace{1cm}\)`__(a)__ Compute `\(r_{im}=-\left[ \frac{\partial L(y_{i},G(x_{i}))}{\partial G(x_{i})} \right]\Big|_{G(x_{i})=G_{m}(x_{i})}\)` for `\(i=1,..,n\)` `\(\hspace{1cm}\)`__(b)__ Fit a regression tree to the `\(r_{im}\)` values and create terminal regions `\(R_{jm}\)`,for `\(j=1,...,m\)` `\(\hspace{1cm}\)`__(c)__ For `\(j=1,\dots,J_{m}\)` comp. `\(\gamma_{i,j}=\sum_{x_{i} \in R_{ij}}\underset{\gamma}{argmin} L(y_{i},G_{m-1}(x_{i})+\gamma)\)` `\(\hspace{1cm}\)`__(d)__ Update `\(G_{m}(\mathbf{x})=G_{m-1}(\mathbf{x}) +\eta \sum_{j=1}^{J_m} \gamma_{jm} I(x \in R_{jm})\)` -- __Step 3__: Output `\(G_{M}(x)\)` --- background-image: url(code_time.gif) background-size: cover class: inverse, center, middle # Code Time! --- <!-- background-image: url(mind_blow.gif) --> <!-- background-size: cover --> <!-- class: inverse, center, middle --> <!-- # XGBoost: A scalable Tree Boosting System --> <!-- --- --> <!-- #The Competitions Winner --> <!-- <br/> --> <!-- <br/> --> <!-- <br/> --> <!-- <br/> --> <!-- <br/> --> <!-- ####.center[Among the 29 challenge winning solutions 3 published at Kaggle’s blog during 2015, 17 solutions used XGBoost] --> <!-- --- --> <!-- # The objective function --> <!-- `$$Obj^{m}=\sum_{i=1}^{N}L(y_{i},\hat{G}^{(t)}(x_{i}))+\sum_{i=1}^{m}\Omega(g_{i})$$` --> <!-- where --> <!-- `$$\Omega(g)=\gamma T+\frac{1}{2} \lambda ||w||^{2}$$` --> <!-- and `\(T\)` is the __number of leafes__ in each tree and `\(w\)` is the score from each leaf., `\(\lambda\)` is the regularization parameter. --> <!-- - Now the optimal score to each leaf is given by --> <!-- `$$w^{*}_{j}=- \frac{\sum_{i \in I_{j}}g_{i}}{\sum_{i \in I{j}}h_{i}+\lambda}$$` --> <!-- --- --> <!-- ## How find the good structure for the tree --> <!-- The gain of split is --> <!--  --> <!-- --- --> <!-- ## Others majors contibutions from the XGBoost --> <!-- - Designed and build a highly scalable end-to-end tree --> <!-- boosting system. --> <!-- - Proposed a theoretically justified weighted quantile --> <!-- sketch for efficient proposal calculation (Find better and faster candidate splits). --> <!-- - Introduced a novel sparsity-aware algorithm for parallel tree learning. --> <!-- --- --> <!-- background-size: cover --> <!-- class: inverse, center, middle --> <!-- # XGBoost: Study Case --> <!-- --- --> <!-- class: center, middle,inverse --> <!-- # Questions? --> <!-- mateusmaia11@gmail.com --> <!-- <br/> --> <!-- <br/> --> <!-- To learn more with XGBoosting --> <!-- xgboost.readthedocs.io/en/latest/build.html --> <!-- To explore more the AdaBoosting --> <!-- mateusmaia.shinyapps.io/adaboosting/ --> <!-- Know the Laboratory in which I participate --> <!-- led.ufba.br -->