Perform time and space evaluation on xgboost models on cpu or gpu — perf_test

run an xgboost with time and memory usage tracking. Purpose of this is to gather simple resource metrics, namely runtime and memory usage, on a single model.

Usage

perf_test_xgboost(train_data, test_data, param, rounds, verbose = FALSE)

Arguments

train_data: an xgboost xgb.dMatrix object of training data
test_data: an xgboost xgb.dMatrix object of testing data
param: a list object with certain parameters set. See the xgboost documentation
rounds: the xgboost number of rounds. See xgboost documentation. this is the major affector of runtime
verbose: set xgboost to verbosity 1 if TRUE, else 0 for no msgs. Default to FALSE

Value

a dataframe with columns time_sec, model_ram_mb, total_mem_used_gb

Note

Caret, for instance, offers some speed-ups which make cross validation, eg, faster by exploiting some methods of sharing data between fold models. This means that it might be slightly faster to run 5-fold cv using caret than it is to run 5 individual models outside of caret.

Examples

test_data = readRDS(
  system.file('testing_gene_data.rds',
          package = 'brentlabModelPerfTesting'))
# suppressWarnings only here b/c test data is too small
# and stratified split is turned off. In general, do not suppressWarnings
suppressWarnings({input_data = prep_data(test_data)})
#> INFO [2023-02-16 20:10:26] creating train test data with 19 predictor variables

param <- list(
  objective = 'reg:squarederror',
  eval_metric = 'mae',
  subsample = 0.5,
  nthread = 1, # expecting to be overwritten
  max_depth=10, # expecting to be overwritten
  max_bin = 10, # expecting to be overwritten
  tree_method = 'hist')

perf_test_xgboost(input_data$train, input_data$test, param, 5)
#>    time_sec model_ram_mb total_mem_used_gb
#> 1 0.5417812            1              0.16

if (FALSE) {
# using the gpu
test_data = readRDS(
  system.file('testing_gene_data.rds',
          package = 'brentlabModelPerfTesting'))
# suppressWarnings only here b/c test data is too small
# and stratified split is turned off. In general, do not suppressWarnings
suppressWarnings({input_data = prep_data(test_data)})

# this is an example using the gpu. Note that
# the number of rounds has been increased so that
# if you're using a nvidia gpu, you could watch this run using
# watch -n0.1 nvidia-smi
param <- list(
  objective = 'reg:squarederror',
  eval_metric = 'mae',
  subsample = 0.5,
  max_depth=10, # expecting to be overwritten
  max_bin = 10, # expecting to be overwritten
  tree_method = 'gpu_hist')

perf_test_xgboost(input_data$train, input_data$test, param, 1000)
}