This document was made in the RStudio environment, and expects that
you have installed the deSolve
, rootSolve
,
coda
and FME
R-packages beforehand (which can
be done by Install Packages
in the Tools
menu
of RStudio). To start a session in Grind first source the
grind.R
script:
source("grind.R")
## grind.R (15-03-2024) was sourced
Grind needs to sourced to define its functions. This document just uses run() and fit().
The Lotka Volterra model, \(R'=rR(1-R/K)-aRN\) and \(N'=caRN-dN\) is written as
model <- function(t, state, parms) {
with(as.list(c(state,parms)), {
dR <- r*R*(1 - R/K) - a*R*N
dN <- c*a*R*N - d*N
return(list(c(dR, dN)))
})
}
where the arguments t, state, parms
provide the time,
the state \((R,N)\), and the parameters
(as defined by the vector p
) to the equations. See this
basic tutorial for
more info on defining models and installing Grind.
Make a data set by calling run(20,table=TRUE)
for the
parameter values, \(r=1,K=1,a=1,c=1\),
and \(d=0.5\), and the initial
condition \(R=1\) and \(N=0.01\). Finally, add 10% noise and plot
the “data” using points()
:
p <- c(r=1,K=1,a=1,c=1,d=0.5)
s <- c(R=1,N=0.01)
data <- run(25,table=TRUE)
data$R <- data$R*rnorm(26,1,0.1)
data$N <- data$N*rnorm(26,1,0.1)
points(data$time,data$R,col="red")
points(data$time,data$N,col="blue")
Change all parameter values and the state a little with a 10%
standard deviation, and estimate their original values by fitting the
data. In this example all parameters (including the initial condition)
are free to be estimated (lower=0
garantuees that they stay
larger than zero).
s <- s*abs(rnorm(2,1,0.1))
p <- p*abs(rnorm(5,1,0.1))
f <- fit(data,free=c(names(s),names(p)),lower=0)
## SSR: 0.111713 Estimates:
## R N r K a c
## 1.071874388 0.007842814 0.812418122 0.960167261 0.770793801 1.425728852
## d
## 0.555882573
which yields a reasonable estimate of all parameter values:
print(c(s,p));print(f$par)
## R N r K a c
## 1.002606960 0.009864857 0.938853487 1.280958060 0.990815092 0.927679938
## d
## 0.487779152
## R N r K a c
## 1.071874388 0.007842814 0.812418122 0.960167261 0.770793801 1.425728852
## d
## 0.555882573
Check the identifiability of all parameters by asking for a summary. Parameters should not be correlated and should be significantly different from zero:
summary(f)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## R 1.071874 0.046717 22.944 < 2e-16 ***
## N 0.007843 0.003380 2.320 0.024917 *
## r 0.812418 0.150916 5.383 2.54e-06 ***
## K 0.960167 0.027658 34.715 < 2e-16 ***
## a 0.770794 0.150499 5.122 6.13e-06 ***
## c 1.425729 0.335367 4.251 0.000106 ***
## d 0.555883 0.073151 7.599 1.33e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04982 on 45 degrees of freedom
##
## Parameter correlation:
## R N r K a c d
## R 1.00000 -0.2088 0.01794 -0.1934 -0.02098 0.08656 0.12703
## N -0.20883 1.0000 -0.25009 0.4285 -0.15365 -0.33487 -0.82956
## r 0.01794 -0.2501 1.00000 0.3169 0.96612 -0.74268 0.06425
## K -0.19344 0.4285 0.31688 1.0000 0.47588 -0.73728 -0.63790
## a -0.02098 -0.1537 0.96612 0.4759 1.00000 -0.85612 -0.11536
## c 0.08656 -0.3349 -0.74268 -0.7373 -0.85612 1.00000 0.60436
## d 0.12703 -0.8296 0.06425 -0.6379 -0.11536 0.60436 1.00000
An alternative approach to check the confidence ranges of parameters
is to bootstrap the data, i.e., sample the data and
refit then. The option bootstrap=100
samples and fits the
data a 100 times.
The above example considered a single data set with two columns, with
values that were both predicted by the model used to fit the data.
If one has more than one data set, the first argument of fit() should be
a list
of all data sets. Split data into two seperate data
sets, and fit both:
dataR <- data; dataR$N <- NULL
dataN <- data; dataN$R <- NULL
par(mfrow=c(1,2))
f <- fit(list(dataR,dataN),free=c(names(s),names(p)),lower=0,main=c("A","B"))
## SSR: 0.111713 Estimates:
## R N r K a c
## 1.071901100 0.007845081 0.812425753 0.960181141 0.770801507 1.425641010
## d
## 0.555863699
which should provide the same estimates.
Things become more tricky when data sets have both shared and unique parameters. Let’s make a second data set with a different initial condition and a different carrying capacity
p <- c(r=1,K=0.75,a=1,c=1,d=0.5)
s <- c(R=0.5,N=0.05)
data2 <- run(25,ymax=1,table=TRUE)
data2$R <- data2$R*rnorm(26,1,0.1)
data2$N <- data2$N*rnorm(26,1,0.1)
points(data2$time,data2$R,col="red")
points(data2$time,data2$N,col="blue")
Next pick an average guess for the initial condition and \(K\), change the parameters a little, and
use the option differ=c("R","N","K")
to define the \(R, N\), and \(K\) are free and different between the two
data sets (the other free parameters are shared)
s <- c(R=0.75,N=0.02); p["K"] <- 0.85
p <- p*abs(rnorm(5,1,0.1))
par(mfrow=c(1,2))
f <- fit(list(data,data2),free=names(p),differ=c("R","N","K"),lower=0,main=c("A","B"))
## SSR: 0.1967473 Estimates:
## r a c d R N
## 0.835913956 0.778838644 1.433373036 0.563996366 1.075003775 0.007610259
## K R N K
## 0.953271513 0.593715374 0.044576241 0.726364573
where the first \(R\), \(N\), and \(K\) correspond to the first data set, and
the second \(R\), \(N\), and \(K\) to the second set of data. Use a list
for differ
if one needs to provide individual initial
guesses for the free parameters that differ, e.g.,
differ=list(R=c(0.9,0.55),N=c(0.02,0.04),K=c(1.1,0.7))
.
Finally, if data sets differ in parameters that are known, and should
not be estimated, one can use the option fixed
to define
fixed parameters that differ in a manner very similar to the option
differ
. Assuming we know the true initial condition \((R,N)=(1,0.01)\) and \((R,N)=(0.5,0.05)\) one would write
fixed <- list(R=c(1,0.5),N=c(0.01,0.05))
par(mfrow=c(1,2))
f <- fit(list(data,data2),free=names(p),differ="K",fixed=fixed,lower=0,main=c("A","B"))
## SSR: 0.2161929 Estimates:
## r a c d K K
## 0.9408159 0.8976794 1.1611255 0.5232829 0.9723061 0.7364806
Fitting goes much faster if the analytic solution of a model is
known, and run()
and fit()
allow for this via
the option solution=TRUE
. For example, consider the
solution of the logistic growth equation \[N'=rN(1-N/K) \quad\to\quad
N(t)=\frac{K}{1+\mathrm{e}^{-rt}(K/N(0)-1)} \ ,\] and fit that
solution to artificial data (generated by the same model)
model <- function(t, state, parms) {
with(as.list(c(state,parms)), {
Nt <- K/(1+exp(-r*t)*(K/N-1))
return(list(c(Nt)))
})
}
p <- c(r=1,K=1)
s <- c(N=0.01)
f <- run(10,0.1,solution=TRUE,legend=FALSE)
data <- run(10,1,solution=TRUE,table=TRUE,timeplot=FALSE)
data$N <- data$N*rnorm(11,1,0.1)
points(data$time,data$N)
f <- fit(solution=TRUE)
## SSR: 0.02094403 Estimates:
## N r K
## 0.006893928 1.059963352 1.029152698
where fit()
fits the data allowing all parameters and
\(N(0)\) to be free.
Grind’s function fit()
calls FME’s function
modFit()
and the default method used by
modFit()
is the Levenberg-Marquardt algorithm, which is
fast, but easily gets stuck on local optima. Calling fit()
with the modFit()
option method=Pseudo
(which
requires lower and upper bounds on the estimates) takes longer, but is
much better at finding global optima (see the manual of FME or seek help
by typing ?modFit()
).
Another important option is logpar=TRUE
which fits the
logarithm of the parameters (rather than their true vales). This is
important when parameters have very different values (and need to stay
positive). Grind will report the true values, but the list returned by
modFit()
contains the log values.