Estimating and projecting the number of new HIV diagnoses and incidence in Spectrum's case surveillance and vital registration tool

Supplemental Digital Content is available in the text


A Simple Model for HIV infected individuals
In this paragraph, we consider a birth cohort in a population in which HIV infection is spreading over time. Let us assume that we are following the cohort of people alive between times 0 and t. Let w, v and s be the respective times of infection, diagnosis and treatment initiation; and let S(t) be the susceptible population and let I un (t, w), I du (t, w), I dt (t, v, w) be the undiagnosed, diagnosed untreated, and diagnosed and treated infected populations, respectively. Figure 1 illustrates the flow of individuals in our population in the compartments considered here. That flow can be described by the following system: ∂S(t) ∂t = − (λ(t) + µ(t)) S(t) I un (t, t) = λ(t)S(t) ∂I un ∂t + ∂I un ∂w = − (m un (t, w) + δ(t, w) + µ(t)) I un (t, w) I du (t, t, w) = δ(t, w)I un (t, w) ∂I du ∂t + ∂I du ∂w + ∂I du ∂v = − (m un (t, w) + η(t, v, w) + µ(t)) I du (t, v, w) I dt (t, t, v, w) = η(t, v, w)I du (t, v, w) ∂I dt ∂t + ∂I dt ∂w + ∂I dt ∂v + ∂I dt ∂s = − (m dt (t, v, w) + µ(t)) I dt (t, s, v, w) where λ is the incidence function; µ is the background mortality; δ is the diagnosis rate; m un is the mortality among HIV infected and undiagnosed; m du is the mortality among infected, diagnosed and untreated; η is the treatment initiation among diagnosed individuals; and m dt is the mortality rate among infected, diagnosed and treated individuals. This shows that, in order to tract infection, diagnosis and treatment initiation, a minimum of 10 compartment should be considered by age or risk group. In order to avoid this, we consider instead a simplified version in which infection, diagnosis and treatment initiation times are not tracked. The system reduces to a system of ODE: where the symbol˜put on top of population type, indicates that a sum was taken over all the possible infection time, diagnosis time and/or treatment initiation; and the the overall rates in the populations of interest. The solution to (2) can be otained from the solution of (1) if we have: We can obtain the time to diagnosis and CD4 at diagnosis using formulae similar to (3) and (4). In fact, the mean time from infection to diagnosis is given by (7) and, if g(t, w) gives the CD4 trajectory as a function of infection time, then the mean CD4 for newly diagniosed individuals,g, is given by (8).
Now, we propose to approximate the solution to the System (3)-(6) using the following scheme: where E 0,t = e λtτ , E 1,t = e −(mun,t+δt)τ , E 2,t = e −(m du,t +ηt)τ and E 3,t = e −m dt,t τ , and the subscripts indexed by time (t or t + τ ) indicate the dependency with respect to time. Note that the background mortality was dropped from (9)-to-(12) because we are only interested in proportions; the main simulation is performed using Spectrum's AIM. In fact, for each bith cohort, we solve the system (9)-to-(12) with the initial condition S 0 = 1, I un,0 = I du,0 = I dt,0 = 0 then, for each time, the proportion of undiagnosed, q t is given by (13).
and the proportion of PLHIV newly diagnosed (between t and t + τ ), p new,t+τ is given by (14).
In this work, we assume that the diagnosis rate is proportional to mortality rate in absence of treatment, i.e.
and Γ is a Gamma cummulative distribution function with shape z 1 and scale 1, z 2 is a scale factor, and t 0 is the year when the first diagnosis was observed.
Formulae (3)-to-(8) do not appear very practical because they involve integrals of functions. Their discretized versions are used instead. Furthermore, the closed forms of functions δ(t, w), g(t, w), m un (t, w), η(t, v, w) and m dt (t, v, w) are not directly available from Spectrum. We discuss their construction in Section 1.1. Figure 1: Flow chart for the HIV infected population. λ(w) is the incidence hazard rate at time w, I un (t, w) is the infected population not yet diagnosed, I du (t, w) is the infected diagnosed but who have not started treatment yet, I dt (t, w) is the infected population on treatment, δ(t, w) is the rate at which individuals are diagnosed, m un (t, w) is the mortality rate for undiagnosed, m dt (t, s, v, w) is the mortality for those who started treatment at time s and η(t, v, w) is the treatment initiation rate.

CD4 trajectories, mortality and diagnosis rates
We obtain the CD4 trajectory as a function of age at infection using Spectrum progression rates as follows. Let a 0 be the age at infection and c 0 the CD4 count at infection.
Define the age category, a1, the CD4 category, c1, the time of change of CD4 category, u, the time of change of category, v. Now, for each CD4 category c1, let C c1 be the largest CD4 value for that category and let ν a1,c1 be the CD4 progression rate from CD4 category c1 to the lower CD4 category, c1 + 1, for individuals in the age category a1. Let A 1 denote the maximum age for individuals in the age category a1.
Set a = a 0 , c = c 0 and repeat the following steps.

Algorithm 1
Step 1 Calculate: Step 2 Do: • If progression to the next CD4 category occurs before age category change, i.e. u < v, then: • Else (change of age caterogy happens before CD4 category threshold), then: Step 3 Do: • If minimum CD4 or maximum age is reached, then Stop.
• Else go to Step 1.
The above algorithm can give the CD4 trajectory of infected people as a function of both time (or, equivalently, age) and CD4 at infection,f · (t, w, c 0 ), where (· = x for men or y for women). In order to obtain the trajectory as a function of time only, we integrate that function over the CD4 distribution at infection, using Spectrum assumption regarding that distribution; i.e.
where π a−t+w,c0 is the probability that an individual who became infected at age a − t + w had CD4 count in the category c 0 at infection. Figures 2 (a) and (b) displays CD4 trajectories as a function of age at infection for both women and men, obtained as described in Algorithm 1 for developed countries, using Spectrums assumptions.
In order to obtain mortality rates as a function of time and age at infection for undiagnosed individuals, one can follow the the CD4 trajectory and assign the mortality rate that corresponds to the CD4 category. More precisely, to get mortality for individuals infected at age a0 (or time w), we first obtain their CD4 at time t,f · (t, w) then, we use the CD4 category obtain that way to get the mortality as assumed by Spectrum. Figures 3 (a) and (b) illustrates mortality rates as a function of age at infection for women and men, respectively. Similarly, we can obtain mortality as a function of age at infection and age at ART initiation; Figures 4 (a)and (b) illustrate the change in mortality rate as a function of age at ART initiation, for women in developed countries who became infected at 15 and 30 years, respectively. Finally, because we can track CD4 trajectories as a function of time since infection, one can easilly specify the treatment initialtion rate η(t, v, w) as a function of WHO guidelines. More specifically, we assume that, for diagnosed individuals, the time to ART initiation when eligibility criteria are met follow an Exponential distribution with mean 3 months. Moreover, diagnosed individuals who do not meet treatment eligibility also intiate treatment at a rate proportional to HIV related mortality; i.e., if µ 0 is HIV related mortality at treatment initiation and µ c is mortality rate of noneligible individuals in the CD4 category c, then treatment itiation rate for these individuals is 3 µc µ0 e ǫ 1+e ǫ .

Incidence Options
Double and single logistic curves. Prior this work, the principle supporting the CSVAR fitting tool was to choose a family of parametric curves, vary the parameter values and retain the set that provides the best fit to the data, using the either maximum likelihood estimation method or by minimizing the Chi-squared distance (see [2]). The family of candidate curves consisted of double logistic and single logistic functions (see [2]). The double logistic curve was parameterized by 5 parameters while the single logistic was parameterized by two parameters. To be more specific, the double logistic incidence curve was given by (15) and the single logistic incidence curve was given by (16) This contains functions flexible enough to capture most HIV epidemic trends. However, for a handful of countries, the best curve obtained from that family was not satisfactory.
In the Bayesian framework, the double logistic model is fitted with the following prior distribution on its parameters: Second Order Segmented polynomials. We included second order segmented polynomial functions (see [3]) as an option, to circumvent the limitations of the single and double logistic curves. With this choice, we can estimate the position of knots. Although this family of functions is very flexible, the number of parameters needed can be relatively large. Furthermore, the functions in that family are not naturally constrained to be non negative. One can  use the augmented Lagragian to enforce constraints on the incidence or, alternatively, in order not increase the complexity of the problem, we propose to use a transformation of the spline. In fact, we used the transformtation x → i max x 2 1+x 2 , where i max is a prior bound on the incidence rate; i.e., for this model, the incidence is given by (19): were λ max is the largest possible value allowed for the incidence rate, and for t ∈ (t k−1 , t k ) t 0 is the start year of the epidemic, a k , b k , k = 0, 1, 2, 3 and t k , k = 1, 2 are parameters to be estimated. In Bayesian analyses this model was fitted with the following prior distributions: where ζ = (ζ 1 , ζ 2 , ζ 3 ) is the inverse of the transformation (x 1 , Transmission model using the rlogistic function. Instead of directly modelling the HIV incidence rate directly, we also consider modelling the transmission rate r(t), as in the Estimation Projection Package (EPP) model [1]. In this case, the incidence rate is given by (21).
where p(t) and κ(t) are the prevalence and ART coverage at time t, respectively, and 0.7 is the average reduction in transmission per additional person on ART. We use a logistic function to model the logarithm of r(t), termed rlogistic with four parameters: where exp(r 0 ) is the initial exponential growth rate of the epidemic, exp(r ∞ ) is the equilibrium value for r(t), α is the rate of change of r(t) in the log-scale, and t mid is the inflection point. For this model, we additionally specify a fifth parameter, ι, as the incidence rate at time t = t 0 , providing the initial pulse of infections. This model is fitted with prior distributions on its parameter: (n u (t uj ) log( n u (θ, t uj )) − n u (θ, t uj )) , where we adopted the convention that the sum over an empty set is zero; θ is the vector of parameter values, and the hat over quantities indicates Spectrum predictions when the incidence curve is defined by θ.
For the minimum Chi-Squared distance method, the loss function was rather given by (24).
Note that, in the previous version, it was not possible to get the number of new diagnoses directly from Spectrum. The number of new diagnoses was used to obtain a crude approximation of the number of new incident cases and vice versa.

New version
The models presented in Sections 1 allow avoiding the assumptions needed to derive the number of incident cases at the cost of adding two more parameters used to model the diagnosis rate as a function of time. We assume that the observed CD4 at diagnosis follow a Gamma distribution with parameter (α, β). Let θ = (θ ′ , α, β, ǫ) be the parameter under the new version (where θ ′ is the incidence parameter, and ǫ models treatment initiation for in dividuals who don't meet WHO eligibility criteria). Now, we assume that mean CD4 can only be measured in years when new diagnoses are observed and their number in greater that 1.
Let n a (t aj ), j = 1 . . . j a be the number of people on ART reported by the country. We assume here that the numbers observed follow Poisson distributions, and that the mean CD4 at diagnosis follow Gamma distributions. If we keep the notations of the previous paragraphs, the loss function given by (23) becomes: nlik(θ) = − u=i,d,h ju j=1 n u (t uj ) log( n ( u θ, t uj )) − n u (θ, t uj ) − ja j=1 (n a (t aj ) log( n a (θ, t aj )) − n a (θ, t aj )) − j d j=1 (αn d (t j ) − 1) g j (t j ) log( g j (t j )) − αn d (t j ) g j (θ, t j ), We fitted the models by maximizing the posterior likelihood, which is equivalent to minimizing: where P 1 is determined by (17), (18), (20) or (22), and P 2 is such that log(z 1 ) ∼ N (2.7, 10), log(z 2 ) ∼ N (1.5, 10), and ǫ ∼ N (0, 1). Kernel Hamiltonian Monte Carlo [4] was implemented as an option for the calibration.