Skip to contents

This function is based on train, which runs models (in our case different smoothing algorithms) on data across different parameter values (in our case different smoothness parameters).

Usage

train_smooth_data(
  ...,
  x = NULL,
  y = NULL,
  sm_method,
  preProcess = NULL,
  weights = NULL,
  metric = ifelse(is.factor(y), "Accuracy", "RMSE"),
  maximize = ifelse(metric %in% c("RMSE", "logLoss", "MAE", "logLoss"), FALSE, TRUE),
  trControl = caret::trainControl(method = "cv"),
  tuneGrid = NULL,
  tuneLength = ifelse(trControl$method == "none", 1, 3),
  return_trainobject = FALSE
)

Arguments

...

Arguments passed to smooth_data. These arguments cannot overlap with any of those to be tuned.

x

A vector of predictor values to smooth along (e.g. time)

y

A vector of response values to be smoothed (e.g. density).

sm_method

Argument specifying which smoothing method should be used. Options include "moving-average", "moving-median", "loess", "gam", and "smooth.spline".

preProcess

A string vector that defines a pre-processing of the predictor data. The default is no pre-processing. See train for more details.

weights

A numeric vector of case weights. This argument currently does not affect any train_smooth_data models.

metric

A string that specifies what summary metric will be used to select the optimal model. By default, possible values are "RMSE" and "Rsquared" for regression. See train for more details.

maximize

A logical: should the metric be maximized or minimized?

trControl

A list of values that define how this function acts. See train and trainControl for more details.

tuneGrid

A data frame with possible tuning values, or a named list containing vectors with possible tuning values. If a data frame, the columns should be named the same as the tuning parameters. If a list, the elements of the list should be named the same as the tuning parameters. If a list, expand.grid will be used to make all possible combinations of tuning parameter values.

tuneLength

An integer denoting the amount of granularity in the tuning parameter grid. By default, this argument is the number of levels for each tuning parameter that should be generated. If trControl has the option search = "random", this is the maximum number of tuning parameter combinations that will be generated by the random search. (NOTE: If given, this argument must be named.)

return_trainobject

A logical indicating whether the entire result of train should be returned, or only the results element.

Value

If return_trainobject = FALSE (the default), a data frame with the values of all tuning parameter combinations and the training error rate for each combination (i.e. the results element of the output of train).

If return_trainobject = TRUE, the output of train

Details

See train for more information.

The default method is k-fold cross-validation (trControl = caret::trainControl(method = "cv")).

For less variable, but more computationally costly, cross-validation, users may choose to increase the number of folds. This can be done by altering the number argument in trainControl, or by setting method = "LOOCV" for leave one out cross-validation where the number of folds is equal to the number of data points.

For less variable, but more computationally costly, cross-validation, users may alternatively choose method = "repeatedcv" for repeated k-fold cross-validation.

For more control, advanced users may wish to call train directly, using makemethod_train_smooth_data to specify the method argument.