Skip to contents

Introduction

This is a vignette to demonstrate potential use of functions in the R package branchingprocess to quantify outbreak risk posed by an infectious disease with emerging transmission potential.

Some emerging infectious organisms have the ability to transmit from person to person, but the risk of the organism causing an outbreak that infects many individuals over multiple generations of transmission is unclear. Perhaps a pathogen exists primarily in non-human animal reservoirs but occasionally spills over to humans, and those humans may or may not transmit to one or two family members. Sometimes, a spillover event leads to an outbreak infecting dozens of people over a few generations of transmission before fizzling out.

What are the chances that a new spillover event will lead to an ever larger outbreak than what has been seen so far? Is it possible that an extremely large outbreak has a reasonable chance of happening and has been avoided so far only by random chance? The functions in this package can help provide quantitative answers to these questions.

Offspring distribution

A branching process is a stochastic (random) process in which each individual in generation nn produces a random number of individuals, or “offspring,” in generation n+1n+1, continuing for some number of generations or until there are no individuals remaining. If no individuals remain, i.e. every individual in the last generation produced no offspring, we call this extinction.

The random number of next-generation individuals produced by each individual is drawn from the offspring distribution, a discrete probability distribution with non-negative range. To model infectious disease outbreaks, it is common to use a negative binomial offspring distribution with two parameters: the mean RR and dispersion parameter kk.

RR is the reproduction number, the expected (mean) number of transmissions from an infected individual. For a fixed value of RR, a lower dispersion kk gives a higher variance of the negative binomial distribution: variance = R+R2/kR + R^2/k. Infectious diseases with a lower value of kk exhibit a higher rate of “superspreading” events, in which a single individual produces a number of transmission substantially higher than the mean.

This parameterization of the negative binomial distribution is equivalent to using mu = R and size = k in the native NegBinomial, e.g. dnbinom(x, mu=R, size=k) would give the density, i.e. the probability of exactly xx transmissions from one individual.

The function pNextGen(x,y,R,k) extends the dnbinom() function to more than one potentially transmitting individual. I.e. it calculates the probability that x individuals in one generation transmit to a total of y individuals in the next generation, where each of the x individuals independently transmits according the negative binomial distribution parameterized by R and k.

Example: With R=0.8R=0.8 and k=0.1k=0.1, and with 2 individuals infected, what is the probability that those 2 individuals will produce a total of exactly 5 transmissions in the next generation?

pNextGen(x=2, y=5, R=0.8, k=0.1)
#> [1] 0.02114661

There is about a 2.1% chance that the next generation will consist of exactly 5 infected individuals. Note that pNextGen() does not explicitly calculate the component probabilities for each of the two potential transmitters (e.g., one transmitting to 2 and other transmitting to 3, etc.). It is an efficient calculation for the probability that the total number of transmissions is 5.

Dispersion effects

We can observe some effects of the dispersion parameter kk by comparing part of the distribution for two examples:

pNextGen(x=2, y=c(0,1,5,10), R=0.8, k=10)*100
#> [1] 21.454820740 31.784919615  2.033695140  0.002137314
pNextGen(x=2, y=c(0,1,5,10), R=0.8, k=0.1)*100
#> [1] 64.4394015 11.4558936  2.1146614  0.6795566

The results show the probability (times 100, or % chance) that two individuals transmit to a total of exactly 0, 1, 5, or 10 when R=0.8R = 0.8, for two values of kk.

With a high value of k=10k=10, there is only a 21% chance that no transmissions occur, but the probability of a higher numbers of transmissions drops quickly: there is only a 0.002% or apporximately one-in-500 chance that 10 transmissions occur.

With a low value of k=0.1k=0.1, we see some very different results despite the fact that the reproduction number is the same. From the perspective of public health, a lower value of kk is a mixture of good and bad news. The good news is that the probability of no transmissions (the best case scenario) has risen to 64%, about 3 times higher than the result for k=10k=10. The bad news is that if transmissions do occur, there is a substantially higher chance that the number of transmissions will be high; for example there is a 0.8% chance of 10 total transmissions from the first two individuals.

Final size distribution

Of course, if the initial cases produce transmissions in the next generation, the outbreak may not end there. Those transmissions produce potentially infectious individuals in the second generation who can subsequently transmit to produce a third generation, and so on. If RR does not exceed 1, it is mathematically provable that the outbreak will eventual go extinct, i.e. there will be eventually be a generation of infected individuals who all produce zero transmissions. But how big could an outbreak get before that happens?

A common approach to analyzing the distribution of stochastic outbreak sizes is to run repeated simulations of outbreaks that randomize transmissions from each simulated individual and repeat, e.g. within a while loop, until no further transmissions can occur. However, there are mathematical formulae derived in the literature that can efficiently calculate the distribution of final outbreak sizes with no simulation needed.

The function pFinalSize() makes use of such a formula, for the case of the same negative binomial offspring distribution applied independently to each infected individual. For example,

pFinalSize(n=2, j=12, R=0.8, k=0.1)
#> [1] 0.006418035

This calculated the probability that, with R=0.8R=0.8 and k=0.1k=0.1 applied to every case, 2 initially infected individuals produce an outbreak that ends with a final size of 12, i.e. the 2 initial cases plus 10 subsequent cases, over any possible number of transmission generations.

Estimates from outbreak cluster data

Now let’s look at an example of how this function could be applied to study data from actual observed outbreak sizes. The H5N1 strain of influenza is currently of concern because of outbreaks among animals that sometimes cause spillover to humans. If these spillover events start leading to some substantial outbreaks involving human-to-human transmission, how could we use pFinalSize() to study these data?

Say there are 30 H5N1 human infections from zoonotic spillover events, i.e. 30 instances of human infection from exposure to animals. In 25 of the spillover events, no subsequent human-to-human occurred. In 3 of the spillover events, exactly one human-to-human transmission occurred from the initially infected human with no further transmission (final size of 2 infections). In one spillover event, there was an outbreak of final size 10 infections, and another spillover event produced an outbreak of final size 50 infections. What values of R and k would best characterize this set of observations?

We can make use of the pFinalSize() function to create a (log) likelihood function for this dataset of outbreak clusters. First, we write a function called pClusterSizes() that calculates the probabilities of the four different final sizes observed (1, 2, 10, and 50) for given values of R and k:

pClusterSizes <- function(R,k) pFinalSize(n=1, j=c(1,2,20,50), R=R, k=k)

The argument n to pFinalSize is the initial number of cases at the start of an outbreak, here set to n=1 because each of the observed H5N1 outbreak started from one initial infection (the spillover event). The argument j is the final size of the outbreak, here set as a vector of the four unique final sizes observed.

pClusterSizes(R = 1.5, k = 1) * 100
#> [1] 40.000000000  9.600000000  0.118406301  0.008701609

If the reproduction number R = 1.5, and the dispersion parameter k = 1, the function calculates a 40% chance that an outbreak of total size 1 (no tranmsission) would occur, a 9.6% chance of exactly size 2, a 0.12% chance of exactly size 10, and less than 0.01% chance of outbreak of exactly size 50.

Next we write a function logLikData() to calculate the log likelihood of observing 25, 3, 1, and 1 outbreak(s) of those four sizes, for given values of R and k:

logLikData <- function(R,k) sum(c(25,3,1,1)*log(pClusterSizes(R,k)))

What is the likelihood of observing exactly these 30 outbreak sizes after 30 spillover events, for our test values of R and k?

exp(logLikData(1.5,1))
#> [1] 1.026332e-20

Can we do better for different choices of R and k? We can use optim to locate the maximum likelihood solution for R and k by optimizing the log likelihood:

optim(f = function(x) ifelse(all(x>0), -logLikData(x[1],x[2]), Inf), par = c(R=1,k=1))$par
#>          R          k 
#> 0.70283388 0.07953917

The maximum likelihood estimate is R = 0.7, k = 0.08.

Risk assessment

With estimates based on past outbreaks in hand, what can we say about what may happen after future spillover events? What are the chances that we have not yet seen a “worst case scenario” and that even larger outbreaks are coming?

For example, from the estimates above with R = 0.7, k = 0.08, what is the probability that a singles spillover event will result in an outbreak of 100 or more cases, i.e. more than twice as big as the previous worst outbreak?

1-sum(pFinalSize(1,1:99,0.7,0.08))
#> [1] 0.004519336

The model estimates that there would be about 0.5% or roughly 1-in-200 chance of a 100-or-more case outbreak after a future spillover event.

This type of result is known as an “exceedance” probability. There is not a separate function for calculating exceedance, we only have a function to calculate the probability of an exact final size. Hence, the above function call to pFinalSize calculates the probability of a final size exactly equal to 1, 2, … 99, and one minus the sum of these probabilities is the exceedance probability required.

Control reproduction number

Once an outbreak is underway after an initial (e.g. spillover) event has taken place, the affected individuals and public health officials may take action to reduce the chances of further transmission. Efforts to reduce transmission may be especially intense after a symptomatic individual tests positive for an emerging, high-risk pathogen. To model that effect, the package provides functions to calculate the final size distribution for a branching process in which the offspring distribution parameters (e.g. the reproduction number) changes after a given number of transmission generations following the initial infection.

The function pFinalSizeSwitchOne(n,j,R0,k0,Rc,kc) calculates the probability that n initial cases lead to an outbreak of final size j given that the initial cases transmit with a negative binomial offspring distribution with parameters R = R0 (the basic reproduction number) and k = k0, and any/all subsequent cases transmit with a negative binomial offspring distribution with parameters R = Rc (the control reprodcution number) and k = kc.

For example,

fs1to10 <- Vectorize(pFinalSizeSwitch1)(n=1, j=1:10, R0=2, k0=0.1, Rc=0.8, kc=0.1)
names(fs1to10) <- paste0("size",1:10)
c(fs1to10,sizeover10 = 1-sum(fs1to10))
#>       size1       size2       size3       size4       size5       size6 
#> 0.737527249 0.056385121 0.027732372 0.017937705 0.013070964 0.010188188 
#>       size7       size8       size9      size10  sizeover10 
#> 0.008293508 0.006959109 0.005971689 0.005213379 0.110720714

provides information on a distribution of final sizes when one initial case transmits with R0 = 2, but any subsequent cases transmit with Rc = 0.8, with high variability (k0=kc=0.1).