This function calls other functions to smooth growth curve data
Usage
smooth_data(
...,
x = NULL,
y = NULL,
sm_method,
subset_by = NULL,
return_fitobject = FALSE,
warn_ungrouped = TRUE,
warn_gam_no_s = TRUE
)
Arguments
- ...
Arguments passed to loess, gam, moving_average, moving_median, or smooth.spline. Typically includes tuning parameter(s), which in some cases are required. See Details for more information.
- x
An (often optional) vector of predictor values to smooth along (e.g. time)
- y
A vector of response values to be smoothed (e.g. density). If NULL,
formula
anddata
*must* be provided via...
- sm_method
Argument specifying which smoothing method should be used to smooth data. Options include "moving-average", "moving-median", "loess", "gam", and "smooth.spline".
- subset_by
An optional vector as long as
y
.y
will be split by the unique values of this vector and the smoothed data for each group will be calculated independently of the others.This provides an internally-implemented approach similar to group_by and mutate
- return_fitobject
logical whether entire object returned by fitting function should be returned. If FALSE, just fitted values are returned.
- warn_ungrouped
logical whether warning should be issued when
smooth_data
is being called on ungrouped data andsubset_by = NULL
.- warn_gam_no_s
logical whether warning should be issued when gam is used without
s()
in the formula.
Value
If return_fitobject == FALSE:
A vector, the same length as y
, with the now-smoothed y values
If return_fitobject == TRUE:
A list the same length as unique(subset_by) where each element is an object of the same class as returned by the smoothing method (typically a named list-like object)
Details
For moving_average and moving_median,
passing window_width
or window_width_n
via
...
is required. window_width
sets the width
of the moving window in units of x
, while
window_width_n
sets the width in units of number
of data points. Larger values for either will produce more
"smoothed" data.
For loess, the span
argument sets the fraction of
data points that should be included in each calculation. It's
typically best to specify, since the default of 0.75 is often
too large for growth curves data. Larger values of span
will produce more more "smoothed" data
For gam
, both arguments to gam and
s can be provided via ...
. Most frequently,
the k
argument to s sets the number of
"knots" the spline-fitting can use. Smaller values will be more
"smoothed".
When using sm_method = "gam"
, advanced users may also modify
other parameters of s()
, including the smoothing basis
bs
. These bases can be thin plate (bs = "tp"
,
the default), cubic regressions (bs = "cr"
), or many other
options (see s). I recommend leaving the default
thin plate regressions, whose main drawback is that they are
computationally intensive to calculate. For growth curves data,
this is unlikely to be relevant.
As an alternative to passing y
, for more advanced needs
with loess or gam, formula
and data
can be passed to smooth_data
via the ...
argument
(in lieu of y
).
In this case, the formula should specify the response (e.g. density)
and predictors. For gam
smoothing, the formula should
typically be of the format: y ~ s(x), which uses
s to smooth the data. The data argument should be a
data.frame
containing the variables in the formula.
In such cases, subset_by
can still be specified as a vector
with length nrow(data)