Skip to contents

run an xgboost with time and memory usage tracking. Purpose of this is to gather simple resource metrics, namely runtime and memory usage, on a single model.

Usage

perf_test_xgboost(train_data, test_data, param, rounds, verbose = FALSE)

Arguments

train_data

an xgboost xgb.dMatrix object of training data

test_data

an xgboost xgb.dMatrix object of testing data

param

a list object with certain parameters set. See the xgboost documentation

rounds

the xgboost number of rounds. See xgboost documentation. this is the major affector of runtime

verbose

set xgboost to verbosity 1 if TRUE, else 0 for no msgs. Default to FALSE

Value

a dataframe with columns time_sec, model_ram_mb, total_mem_used_gb

Note

Caret, for instance, offers some speed-ups which make cross validation, eg, faster by exploiting some methods of sharing data between fold models. This means that it might be slightly faster to run 5-fold cv using caret than it is to run 5 individual models outside of caret.

Examples

test_data = readRDS(
  system.file('testing_gene_data.rds',
          package = 'brentlabModelPerfTesting'))
# suppressWarnings only here b/c test data is too small
# and stratified split is turned off. In general, do not suppressWarnings
suppressWarnings({input_data = prep_data(test_data)})
#> INFO [2023-02-16 20:10:26] creating train test data with 19 predictor variables

param <- list(
  objective = 'reg:squarederror',
  eval_metric = 'mae',
  subsample = 0.5,
  nthread = 1, # expecting to be overwritten
  max_depth=10, # expecting to be overwritten
  max_bin = 10, # expecting to be overwritten
  tree_method = 'hist')

perf_test_xgboost(input_data$train, input_data$test, param, 5)
#>    time_sec model_ram_mb total_mem_used_gb
#> 1 0.5417812            1              0.16

if (FALSE) {
# using the gpu
test_data = readRDS(
  system.file('testing_gene_data.rds',
          package = 'brentlabModelPerfTesting'))
# suppressWarnings only here b/c test data is too small
# and stratified split is turned off. In general, do not suppressWarnings
suppressWarnings({input_data = prep_data(test_data)})

# this is an example using the gpu. Note that
# the number of rounds has been increased so that
# if you're using a nvidia gpu, you could watch this run using
# watch -n0.1 nvidia-smi
param <- list(
  objective = 'reg:squarederror',
  eval_metric = 'mae',
  subsample = 0.5,
  max_depth=10, # expecting to be overwritten
  max_bin = 10, # expecting to be overwritten
  tree_method = 'gpu_hist')

perf_test_xgboost(input_data$train, input_data$test, param, 1000)
}