Skip to contents

Provides a rank plot of the actual and predicted.

Usage

actual_expected_bucketed(
  target_variable,
  model_object,
  data_set = NULL,
  number_of_buckets = 25,
  ylab = "Target",
  width = 800,
  height = 500,
  first_colour = "black",
  second_colour = "#cc4678",
  facetby = NULL,
  prediction_type = "response",
  predict_function = NULL,
  return_data = F
)

Arguments

target_variable

String of target variable name.

model_object

GLM model object.

data_set

Data to score the model on. This can be training or test data, as long as the data is in a form where the model object can make predictions. Currently developing ability to provide custom prediction functions, currently implementation defaults to `stats::predict`

number_of_buckets

number of buckets for percentile

ylab

Y-axis label.

width

plotly plot width in pixels.

height

plotly plot height in pixels.

first_colour

First colour to plot, usually the colour of actual.

second_colour

Second colour to plot, usually the colour of predicted.

facetby

variable user wants to facet by.

prediction_type

Prediction type to be pasted to predict.glm if predict_function is NULL. Defaults to "response".

predict_function

prediction function to use. Still in development.

return_data

Logical to return cleaned data set instead of plot.

Value

plot Plotly plot by defualt. ggplot if plotlyplot = F. Tibble if return_data = T.

Examples


library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
library(prettyglm)

data('titanic')

columns_to_factor <- c('Pclass',
                       'Sex',
                       'Cabin',
                       'Embarked',
                       'Cabintype',
                       'Survived')
meanage <- base::mean(titanic$Age, na.rm=TRUE)

titanic  <- titanic  %>%
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>%
  dplyr::mutate(Age =base::ifelse(is.na(Age)==TRUE,meanage,Age)) %>%
  dplyr::mutate(Age_0_25 = prettyglm::splineit(Age,0,25),
                Age_25_50 = prettyglm::splineit(Age,25,50),
                Age_50_120 = prettyglm::splineit(Age,50,120)) %>%
  dplyr::mutate(Fare_0_250 = prettyglm::splineit(Fare,0,250),
                Fare_250_600 = prettyglm::splineit(Fare,250,600))

survival_model <- stats::glm(Survived ~
                               Sex:Age +
                               Fare +
                               Embarked +
                               SibSp +
                               Parch +
                               Cabintype,
                             data = titanic,
                             family = binomial(link = 'logit'))

prettyglm::actual_expected_bucketed(target_variable = 'Survived',
                                    model_object = survival_model,
                                    data_set = titanic)