Smooth data — smooth_data • gcplyr

This function calls other functions to smooth growth curve data

Usage

smooth_data(
  ...,
  x = NULL,
  y = NULL,
  sm_method,
  subset_by = NULL,
  return_fitobject = FALSE,
  warn_ungrouped = TRUE,
  warn_gam_no_s = TRUE
)

Arguments

...

Arguments passed to loess, gam, moving_average, moving_median, or smooth.spline. Typically includes tuning parameter(s), which in some cases are required. See Details for more information.

x

An (often optional) vector of predictor values to smooth along (e.g. time)

y

A vector of response values to be smoothed (e.g. density). If NULL, formula and data *must* be provided via ...

sm_method

Argument specifying which smoothing method should be used to smooth data. Options include "moving-average", "moving-median", "loess", "gam", and "smooth.spline".

subset_by

An optional vector as long as y. y will be split by the unique values of this vector and the smoothed data for each group will be calculated independently of the others.

This provides an internally-implemented approach similar to group_by and mutate

return_fitobject

logical whether entire object returned by fitting function should be returned. If FALSE, just fitted values are returned.

warn_ungrouped

logical whether warning should be issued when smooth_data is being called on ungrouped data and subset_by = NULL.

warn_gam_no_s

logical whether warning should be issued when gam is used without s() in the formula.

Value

If return_fitobject == FALSE:

A vector, the same length as y, with the now-smoothed y values

If return_fitobject == TRUE:

A list the same length as unique(subset_by) where each element is an object of the same class as returned by the smoothing method (typically a named list-like object)

Details

For moving_average and moving_median, passing window_width or window_width_n via ... is required. window_width sets the width of the moving window in units of x, while window_width_n sets the width in units of number of data points. Larger values for either will produce more "smoothed" data.

For loess, the span argument sets the fraction of data points that should be included in each calculation. It's typically best to specify, since the default of 0.75 is often too large for growth curves data. Larger values of span will produce more more "smoothed" data

For gam, both arguments to gam and s can be provided via .... Most frequently, the k argument to s sets the number of "knots" the spline-fitting can use. Smaller values will be more "smoothed".

When using sm_method = "gam", advanced users may also modify other parameters of s(), including the smoothing basis bs. These bases can be thin plate (bs = "tp", the default), cubic regressions (bs = "cr"), or many other options (see s). I recommend leaving the default thin plate regressions, whose main drawback is that they are computationally intensive to calculate. For growth curves data, this is unlikely to be relevant.

As an alternative to passing y, for more advanced needs with loess or gam, formula and data can be passed to smooth_data via the ... argument (in lieu of y).

In this case, the formula should specify the response (e.g. density) and predictors. For gam smoothing, the formula should typically be of the format: y ~ s(x), which uses s to smooth the data. The data argument should be a data.frame containing the variables in the formula. In such cases, subset_by can still be specified as a vector with length nrow(data)