Perform time and space evaluation on xgboost models on cpu or gpu
perf_test_xgboost.Rdrun an xgboost with time and memory usage tracking. Purpose of this is to gather simple resource metrics, namely runtime and memory usage, on a single model.
Arguments
- train_data
an xgboost xgb.dMatrix object of training data
- test_data
an xgboost xgb.dMatrix object of testing data
- param
a list object with certain parameters set. See the xgboost documentation
- rounds
the xgboost number of rounds. See xgboost documentation. this is the major affector of runtime
- verbose
set xgboost to verbosity 1 if TRUE, else 0 for no msgs. Default to FALSE
Note
Caret, for instance, offers some speed-ups which make cross validation, eg, faster by exploiting some methods of sharing data between fold models. This means that it might be slightly faster to run 5-fold cv using caret than it is to run 5 individual models outside of caret.
Examples
test_data = readRDS(
system.file('testing_gene_data.rds',
package = 'brentlabModelPerfTesting'))
# suppressWarnings only here b/c test data is too small
# and stratified split is turned off. In general, do not suppressWarnings
suppressWarnings({input_data = prep_data(test_data)})
#> INFO [2023-02-16 20:10:26] creating train test data with 19 predictor variables
param <- list(
objective = 'reg:squarederror',
eval_metric = 'mae',
subsample = 0.5,
nthread = 1, # expecting to be overwritten
max_depth=10, # expecting to be overwritten
max_bin = 10, # expecting to be overwritten
tree_method = 'hist')
perf_test_xgboost(input_data$train, input_data$test, param, 5)
#> time_sec model_ram_mb total_mem_used_gb
#> 1 0.5417812 1 0.16
if (FALSE) {
# using the gpu
test_data = readRDS(
system.file('testing_gene_data.rds',
package = 'brentlabModelPerfTesting'))
# suppressWarnings only here b/c test data is too small
# and stratified split is turned off. In general, do not suppressWarnings
suppressWarnings({input_data = prep_data(test_data)})
# this is an example using the gpu. Note that
# the number of rounds has been increased so that
# if you're using a nvidia gpu, you could watch this run using
# watch -n0.1 nvidia-smi
param <- list(
objective = 'reg:squarederror',
eval_metric = 'mae',
subsample = 0.5,
max_depth=10, # expecting to be overwritten
max_bin = 10, # expecting to be overwritten
tree_method = 'gpu_hist')
perf_test_xgboost(input_data$train, input_data$test, param, 1000)
}