This document was made in the RStudio environment, and expects that you have installed the deSolve, rootSolve, coda and FME R-packages beforehand (which can be done by Install Packages in the Tools menu of RStudio). To start a session in Grind first source the grind.R script:

source("grind.R")
## grind.R (15-03-2024) was sourced

Grind needs to sourced to define its functions. This document just uses run() and fit().

The Lotka Volterra model

The Lotka Volterra model, \(R'=rR(1-R/K)-aRN\) and \(N'=caRN-dN\) is written as

model <- function(t, state, parms) {
  with(as.list(c(state,parms)), {
    dR <- r*R*(1 - R/K) - a*R*N
    dN <- c*a*R*N - d*N
    return(list(c(dR, dN)))  
  }) 
}  

where the arguments t, state, parms provide the time, the state \((R,N)\), and the parameters (as defined by the vector p) to the equations. See this basic tutorial for more info on defining models and installing Grind.

Make an artificial data set by numerical integration

Make a data set by calling run(20,table=TRUE) for the parameter values, \(r=1,K=1,a=1,c=1\), and \(d=0.5\), and the initial condition \(R=1\) and \(N=0.01\). Finally, add 10% noise and plot the “data” using points():

p <- c(r=1,K=1,a=1,c=1,d=0.5)
s <- c(R=1,N=0.01)
data <- run(25,table=TRUE)
data$R <- data$R*rnorm(26,1,0.1)
data$N <- data$N*rnorm(26,1,0.1)
points(data$time,data$R,col="red")
points(data$time,data$N,col="blue")

Change all parameter values and the state a little with a 10% standard deviation, and estimate their original values by fitting the data. In this example all parameters (including the initial condition) are free to be estimated (lower=0 garantuees that they stay larger than zero).

s <- s*abs(rnorm(2,1,0.1))
p <- p*abs(rnorm(5,1,0.1))
f <- fit(data,free=c(names(s),names(p)),lower=0)
## SSR: 0.111713  Estimates:
##           R           N           r           K           a           c 
## 1.071874388 0.007842814 0.812418122 0.960167261 0.770793801 1.425728852 
##           d 
## 0.555882573

which yields a reasonable estimate of all parameter values:

print(c(s,p));print(f$par)
##           R           N           r           K           a           c 
## 1.002606960 0.009864857 0.938853487 1.280958060 0.990815092 0.927679938 
##           d 
## 0.487779152
##           R           N           r           K           a           c 
## 1.071874388 0.007842814 0.812418122 0.960167261 0.770793801 1.425728852 
##           d 
## 0.555882573

Check the identifiability of all parameters by asking for a summary. Parameters should not be correlated and should be significantly different from zero:

summary(f)
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## R 1.071874   0.046717  22.944  < 2e-16 ***
## N 0.007843   0.003380   2.320 0.024917 *  
## r 0.812418   0.150916   5.383 2.54e-06 ***
## K 0.960167   0.027658  34.715  < 2e-16 ***
## a 0.770794   0.150499   5.122 6.13e-06 ***
## c 1.425729   0.335367   4.251 0.000106 ***
## d 0.555883   0.073151   7.599 1.33e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04982 on 45 degrees of freedom
## 
## Parameter correlation:
##          R       N        r       K        a        c        d
## R  1.00000 -0.2088  0.01794 -0.1934 -0.02098  0.08656  0.12703
## N -0.20883  1.0000 -0.25009  0.4285 -0.15365 -0.33487 -0.82956
## r  0.01794 -0.2501  1.00000  0.3169  0.96612 -0.74268  0.06425
## K -0.19344  0.4285  0.31688  1.0000  0.47588 -0.73728 -0.63790
## a -0.02098 -0.1537  0.96612  0.4759  1.00000 -0.85612 -0.11536
## c  0.08656 -0.3349 -0.74268 -0.7373 -0.85612  1.00000  0.60436
## d  0.12703 -0.8296  0.06425 -0.6379 -0.11536  0.60436  1.00000

An alternative approach to check the confidence ranges of parameters is to bootstrap the data, i.e., sample the data and refit then. The option bootstrap=100 samples and fits the data a 100 times.

More than one data set

The above example considered a single data set with two columns, with values that were both predicted by the model used to fit the data.
If one has more than one data set, the first argument of fit() should be a list of all data sets. Split data into two seperate data sets, and fit both:

dataR <- data; dataR$N <- NULL
dataN <- data; dataN$R <- NULL
par(mfrow=c(1,2))
f <- fit(list(dataR,dataN),free=c(names(s),names(p)),lower=0,main=c("A","B"))
## SSR: 0.111713  Estimates:
##           R           N           r           K           a           c 
## 1.071901100 0.007845081 0.812425753 0.960181141 0.770801507 1.425641010 
##           d 
## 0.555863699

which should provide the same estimates.

Data sets with different parameters

Things become more tricky when data sets have both shared and unique parameters. Let’s make a second data set with a different initial condition and a different carrying capacity

p <- c(r=1,K=0.75,a=1,c=1,d=0.5)
s <- c(R=0.5,N=0.05)
data2 <- run(25,ymax=1,table=TRUE)
data2$R <- data2$R*rnorm(26,1,0.1)
data2$N <- data2$N*rnorm(26,1,0.1)
points(data2$time,data2$R,col="red")
points(data2$time,data2$N,col="blue")

Next pick an average guess for the initial condition and \(K\), change the parameters a little, and use the option differ=c("R","N","K") to define the \(R, N\), and \(K\) are free and different between the two data sets (the other free parameters are shared)

s <- c(R=0.75,N=0.02); p["K"] <- 0.85
p <- p*abs(rnorm(5,1,0.1))
par(mfrow=c(1,2))
f <- fit(list(data,data2),free=names(p),differ=c("R","N","K"),lower=0,main=c("A","B"))
## SSR: 0.1967473  Estimates:
##           r           a           c           d           R           N 
## 0.835913956 0.778838644 1.433373036 0.563996366 1.075003775 0.007610259 
##           K           R           N           K 
## 0.953271513 0.593715374 0.044576241 0.726364573

where the first \(R\), \(N\), and \(K\) correspond to the first data set, and the second \(R\), \(N\), and \(K\) to the second set of data. Use a list for differ if one needs to provide individual initial guesses for the free parameters that differ, e.g., differ=list(R=c(0.9,0.55),N=c(0.02,0.04),K=c(1.1,0.7)).

Finally, if data sets differ in parameters that are known, and should not be estimated, one can use the option fixed to define fixed parameters that differ in a manner very similar to the option differ. Assuming we know the true initial condition \((R,N)=(1,0.01)\) and \((R,N)=(0.5,0.05)\) one would write

fixed <- list(R=c(1,0.5),N=c(0.01,0.05))
par(mfrow=c(1,2))
f <- fit(list(data,data2),free=names(p),differ="K",fixed=fixed,lower=0,main=c("A","B"))
## SSR: 0.2161929  Estimates:
##         r         a         c         d         K         K 
## 0.9408159 0.8976794 1.1611255 0.5232829 0.9723061 0.7364806

Fitting models providing a solution rather than derivatives

Fitting goes much faster if the analytic solution of a model is known, and run() and fit() allow for this via the option solution=TRUE. For example, consider the solution of the logistic growth equation \[N'=rN(1-N/K) \quad\to\quad N(t)=\frac{K}{1+\mathrm{e}^{-rt}(K/N(0)-1)} \ ,\] and fit that solution to artificial data (generated by the same model)

model <- function(t, state, parms) {
  with(as.list(c(state,parms)), {
    Nt <- K/(1+exp(-r*t)*(K/N-1))
    return(list(c(Nt)))  
  }) 
}
p <- c(r=1,K=1)
s <- c(N=0.01)
f <- run(10,0.1,solution=TRUE,legend=FALSE)
data <- run(10,1,solution=TRUE,table=TRUE,timeplot=FALSE)
data$N <- data$N*rnorm(11,1,0.1)
points(data$time,data$N)

f <- fit(solution=TRUE)
## SSR: 0.02094403  Estimates:
##           N           r           K 
## 0.006893928 1.059963352 1.029152698

where fit() fits the data allowing all parameters and \(N(0)\) to be free.

Important options of fit()

Grind’s function fit() calls FME’s function modFit() and the default method used by modFit() is the Levenberg-Marquardt algorithm, which is fast, but easily gets stuck on local optima. Calling fit() with the modFit()option method=Pseudo (which requires lower and upper bounds on the estimates) takes longer, but is much better at finding global optima (see the manual of FME or seek help by typing ?modFit()).

Another important option is logpar=TRUE which fits the logarithm of the parameters (rather than their true vales). This is important when parameters have very different values (and need to stay positive). Grind will report the true values, but the list returned by modFit() contains the log values.