ࡱ > ; * bjbjVV = < < + [, [, [, [, [, t , , , - | / , x 1 ]6 s6 s6 s6 9 D= L > x 9 [, %@ ,9 ^ 9 %@ %@ [, [, s6 s6 =J =J =J %@ b [, s6 [, s6 n =J %@ =J =J s6 w$ , B J Х 0 )D \ [,
8? " Z? =J r? ? 8? 8? 8? I < 8? 8? 8? %@ %@ %@ %@ 8? 8? 8? 8? 8? 8? 8? 8? 8? !+ : Supporting online material
Evaluation of Respondent-Driven Sampling
McCreesh, N et al
Corresponding author: Dr Richard White, Department of Infectious Disease Epidemiology, Faculty of Epidemiology & Population Health, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT. Tel: + 44 (0) 20 7299 4626 Email: richard.white@lshtm.ac.uk
Supporting methods
Target population
The data used to define the target population were available from an ongoing general population cohort of 25 villages in rural Masaka, Uganda covering an area of approximately 38km2 ADDIN EN.CITE ADDIN EN.CITE.DATA HYPERLINK \l "_ENREF_1" \o "Shafer, 2008 #7342" 1, HYPERLINK \l "_ENREF_2" \o "Kamali, 2000 #2371" 2 (main text Figure 1). Annually, households in the study villages are mapped and after obtaining consent, a total-population household census and an individual questionnaire are administered and blood taken for HIV-1 testing.
The study villages are in southwestern Uganda, not far from Lake Victoria. The vast majority of dwellings are distributed throughout the countryside rather than clustered in villages, that mainly represent administrative areas demarcated on maps rather than population centres. The study population are mostly subsistence farmers, whose staple diet consists of matooke (cooking bananas) with groundnuts. There are no tarmac roads and access may be difficult during the rains. People live in semi-permanent structures built from locally available materials. Levels of literacy are low and the main income-earning activities are growing bananas, coffee and beans, and trading produce including fish. HYPERLINK \l "_ENREF_3" \o "Nakibinge, 2009 #7627" ADDIN EN.CITE Nakibinge200976277627762717Nakibinge, S.Maher, D.Katende, J.Kamali, A.Grosskurth, H.Seeley, J.MRC/UVRI Uganda Research Unit on AIDS, Entebbe, Uganda.Community engagement in health research: two decades of experience from a research project on HIV in rural UgandaTrop Med Int HealthTrop Med Int Health190-51422009/02/12*Community-Institutional RelationsHIV Infections/*prevention & controlHealth Promotion/methodsHealth Services Research/*organization & administrationHumansPatient Acceptance of Health Care*Program DevelopmentRural Health ServicesUganda2009Feb1365-3156 (Electronic)
1360-2276 (Linking)19207175http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19207175TMI2207 [pii]
10.1111/j.1365-3156.2008.02207.xeng3
The data used in this study to identify the target population (village residence and head of household status) were collated from ongoing general population cohort surveys on 25 villages in rural Masaka carried out during the 12 months immediately prior to the start of the respondent-driven sampling (February 2009 - Jan 2010). Household was defined by the general population cohort staff as a group of people who share food and other resources. Head of household status was self-defined by the members of the household. The characteristics of the target population were estimated for the start date of the respondent-driven sampling (8 March 2010). Data on the tribe, religion and date of birth were collated from any general population cohort survey. Household socioeconomic status was calculated using principle components analysis from household ownership of 22 items recorded during an annual census (December 2008-October 2009) and categorised into quantiles based on the status of all households in the general population cohort villages. Data on the number of sexual partners in the preceding 12 months were collated from the most recent general population cohort survey round (carried out between December 2009 - October 2010), or if this was unavailable, from the previous survey round (December 2008 - October 2009). HIV testing algorithms and laboratory methods are reported elsewhere HYPERLINK \l "_ENREF_4" \o "Mbulaiteye, 2002 #2706" ADDIN EN.CITE Mbulaiteye200227062706270617Mbulaiteye, S. M.Mahe, C.Whitworth, J. A.Ruberantwari, A.Nakiyingi, J. S.Ojwiya, A.Kamali, A.Medical Research Council Programme on AIDS in Uganda, Uganda Virus Research Institute, PO Box 49, Entebbe, Uganda.Declining HIV-1 incidence and associated prevalence over 10 years in a rural population in south-west Uganda: a cohort studyLancetLancet41-6.3609326AdolescentAdultAge DistributionCohort StudiesFemaleHIV Infections/*epidemiology*Hiv-1HumanIncidenceMaleMiddle AgePrevalence*Rural Health/*trendsSex DistributionSupport, Non-U.S. Gov'tUganda/epidemiology200212114040http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?db=m&form=6&dopt=r&uid=121140404, briefly, HIV status was determined by two independent immunoassays (Wellcozyme HIV-1 recombinant VK 56/57 (Murex Biotech Ltd, Dartford,Kent, UK) and Recombigen HIV-1/2 (Trinity Biotech plc,Galway, Ireland)), confirmed by western blot (Cambridge Biotech HIV-1 Western blot, Calypte Biomedical Corporation, Rockville, MD, USA). Current infection status was imputed based on earlier positive results or later negative results. HYPERLINK \l "_ENREF_1" \o "Shafer, 2008 #7342" ADDIN EN.CITE ADDIN EN.CITE.DATA 1
The target population consisted of 2402 men who were recorded as a male head of a household within the study villages between February 2009 and January 2010. Approximately equal proportions were aged under-30, 30-39, 40-49, and 50 or more years old (main text Table 1, Population proportion column). Membership of the four main tribal groups ranged from 70% Ganda to 2% Kiga. 60% were Catholic, 17% Protestant, and 23% Muslim. The proportion in each village ranged from 2% in village B to 9% in village Q. 42% reported one sexual partner in the preceding year and 6.3% were known to be HIV infected.
The respondent-driven sampling survey
People were eligible for the respondent-driven sampling if they were recorded as a male head of a household within the study villages between February 2009 and January 2010. Three interview sites were placed to minimise the maximum distance between the centre of any eligible village and the nearest interview site (4km) (main text Figure 1).
Seed selection
Ten seeds (number based on a typical number used in respondent-driven sampling studies HYPERLINK \l "_ENREF_5" \o "Malekinejad, 2008 #7052" ADDIN EN.CITE Malekinejad200870527052705217Malekinejad, MJohnston, LGKendall, CKerr, LRFSRifkin, MRRutherford, GWUsing respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: a systematic reviewAIDS and BehaviorAIDS and Behavior105-130Volume 12S12008file:///C:/users/rw/Papers/Malekinejad%20RDS%20systematic%20review%20AIDS%20behav%202008.pdf5) were selected from the target population. Total-population and GPS data were available on the target population, but as data of this quality are typically unavailable to researchers using respondent-driven sampling these data were not used to select seeds. Instead it was assumed that during a typical respondent-driven sampling, pre-study mapping of the target population would yield limited information on the approximate geography, age and tribe distribution of the target population (e Table 9, left), and this information was used to make a proposal for the variation in these characteristics that would be sought in the seeds. The criteria were that one seed would be from each of ten areas covering the study villages and that two seeds would be in each of five age and five tribe groups (e Table 9, right). A list of candidate seeds was then drawn up in consultation with local community leaders by Medical Research Council employees with previous experience of working in these villages. For each of the ten geographic areas shown in e Table 9 (right) one of three Medical Research Council employees identified, by convenience, five popular and well known male household heads who were willing to act as study seeds, and who said they were confident that they could recruit other male household heads for the study. The Medical Research Council employees were asked to select a range of male household heads within each area that approximately covered the desired range of ages and tribes. Thus in total a list of 50 male household heads was drawn up (five candidate seeds from each of the ten areas). Stata was then used to randomly select one seed from each of the ten areas. The characteristics of the set of ten candidate seeds were then compared to the criteria. This process was repeated (with replacement) until a set of seeds matching all criteria was identified. This first seed set identified in this way was used to initiate the study.
Seeds were given three coupons to recruit people into the study. All people receiving coupons were instructed that their potential recruits should attend for interviews within seven days, although potential recruits attending after this time were also interviewed. Potential recruits arriving at the interview sites with valid coupons were assessed for eligibility using their existing general population cohort identity card or reported demographic information. If they were eligible for the study and gave consent, they were enrolled and given a first interview, and are defined as recruits in this paper. In the first interview all recruits were asked to provide details of their relationship with their recruiter and of other male household heads they knew (their network). All recruits were also asked if they wanted to recruit other people. If they accepted, the survey protocol specified that they would be offered three coupons to use to recruit up to three people. However, early in the survey, project staff could not cope with the rapidly increasing number of people who arrived for interviews each day (main text Figure 2a). Therefore, this protocol specification was modified so that the probability of each recruit being offered three coupons, was halved from 100% to 50% from the start of day nine (i.e. 50% were offered zero coupons). When the arrival rate had decreased later in the study (start of day 32), the probability of being offered three coupons was increased from 50% to 100% To close the study the probability of being offered coupons was reduced to 0% when the target sample (900) was about to be reached. Interviews, for those with coupons, continued for another seven days.
If recruits were offered and accepted coupons they were defined as recruiters in this paper. Recruits received one primary incentive for completing the first interview. One incentive was either soap, salt or school books to the value of ~$1US. Recruiters also received one secondary incentive for each person they successfully recruited. Receiving secondary incentives was conditional on also having completed a second interview, during which recruiters were asked to provide details of who they did or did not offer their coupons to, and who accepted or rejected coupons. All recruiters were instructed that they must give out all three coupons before returning to collect their secondary incentives.
The questionnaire was programmed in Access 2003 VBA HYPERLINK \l "_ENREF_6" \o "Microsoft Corporation, 2003 #7354" ADDIN EN.CITE Microsoft Corporation200373547354735442299Microsoft Corporation,Microsoft Access 200320032003Washington6 on Samsung Q1 UMPCs. The protocol ensured interviews could be carried out at any recruitment station by any interviewer. This was achieved by downloading data from the ten fieldworkers UMPCs each evening; reconciling the data in London; uploading an identical copy of the reconciled database to each UMPC each morning; each potential recruit being instructed that they would not be interviewed until the day after they were given coupons; and each recruiter being instructed that they would not be given a second interview until the day after they (the recruiter) were given coupons to give out. As is typical in respondent-driven sampling studies, members of the target group were prevented from being recruited more than once.
We defined network size in five different ways. The first network size definition (NS-1) was created to be comparable with other respondent-driven sampling studies. ADDIN EN.CITE McCarty200172727272727217McCarty, C.Killworth, P. D.Bernard, H. R.Johnsen, E. C.Shelley, G. A.Comparing two methods for estimating network sizeHuman Organization28-396012001file:///C:/users/rw/Papers/McCartyTwo.Methods,2001.pdffile:///C:/users/rw/Papers/McCartyTwo.Methods,2001.pdfMcCormick2010736473647364423817McCormick, THSalganik, MJZheng, THow many people do you know?: Efficiently estimating personal network sizeJournal of the American Statistical AssociationJournal of the American Statistical Association59-701054892010ASA0162-1459 HYPERLINK \l "_ENREF_7" \o "McCarty, 2001 #7272" 7, HYPERLINK \l "_ENREF_8" \o "McCormick, 2010 #7364" 8 Recruits were first asked the core question Baami bameka b'omanyi nga (i) mu myezi kkumi n'ebiri egiyise baali ba nannyinimu mu byalo bya MRC, (ii) era ng'obamanyi nabo bakumanyi, (iii) ng'obalabyeko mu week ewedde? (How many men do you know who (i) were head of a household in the last 12 months in any of the Medical Research Council villages, and (ii) you know them and they know you, and (iii) you have seen them in the past week). We also re-asked the core question but asked the recruit to categorise based on residence (own village or not) (NS-2) and then by residence and tribe (NS-3). Each time the question was re-asked the recruit was reminded of their response to the previous question, but the recruit was not required to reconcile inconsistent responses. We based the final two network size definitions on data collected when the recruits were asked to recall the names and/or other demographic characteristics of each individual eligible member of their network (hereafter called individual-level network members). These details were used by the interviewer to search the general population cohort database (containing details of all men known to the Medical Research Council irrespective of eligibility for the general population cohort or respondent-driven sampling) and attempt unique identification. If the man was positively identified as someone in the general population cohort database (hereafter called identified individual-level network members), this was recorded, else the name/nickname and/or demographic data were recorded for later analysis. Using these data, network size was also defined as the total number of individual-level network members (NS-4), and as the total number of identified individual-level network members who were eligible for the study (NS-5). By definition NS-5 was a subset of NS-4.
Statistical Methods
Pre-processing of the data was performed using Stata v11 (StataCorp, Texas). HYPERLINK \l "_ENREF_9" \o "StataCorp, 2010 #7355" ADDIN EN.CITE StataCorp201073557355735542309StataCorpStata Statistical Software: Release 11.092010College Station, TexasStata Press9 Networks and trees were generated using scripts written in Stata and R v2.12.0(R Foundation, Vienna) HYPERLINK \l "_ENREF_10" \o "R Development Core Team, 2010 #6782" ADDIN EN.CITE R Development Core Team201067826782678236979 R Development Core Team,R language and environment for statistical computing and graphics 2010Vienna, AustriaR Foundation for Statistical Computing, http://www.R-project.org.10 and visualized using GraphViz (AT&T Research, New Jersey). HYPERLINK \l "_ENREF_11" \o "Gansner, 1999 #7357" ADDIN EN.CITE Gansner1999735773577357423217Gansner, E.R.North, S.C.An open graph visualization system and its applications to software engineeringSoftw. Pract. ExperSoftw. Pract. Exper1-5S1199911 Where possible, to maximise the comparability of our methods with those used in a typical RDS study, we analysed the dataset following current recommended statistical methods HYPERLINK \l "_ENREF_12" \o "Salganik, 2004 #7013" ADDIN EN.CITE ADDIN EN.CITE.DATA 12-14 employing RDSAT v6.0.1, HYPERLINK \l "_ENREF_15" \o "Volz, 2007 #7048" ADDIN EN.CITE Volz20077048704870489Volz, EWejnert, C Deganii, l Heckathorn, DDRespondent-Driven Sampling Analysis Tool (RDSAT)6.0.12007Ithaca, NYCornell University15 the custom written software package for the analysis of respondent-driven sampling studies.
Simple sample proportions and respondent-driven sampling estimates were calculated for two different sample sizes. The first was the Full sample. The second was a Small sample consisting of the first 250 recruits (including the 10 seeds) and was designed to be more typical of the sample sizes used in respondent-driven sampling studies (a recent systematic review of 123 respondent-driven sampling studies found a median sample size of 247 and a mean sample size of 273 HYPERLINK \l "_ENREF_5" \o "Malekinejad, 2008 #7052" ADDIN EN.CITE Malekinejad200870527052705217Malekinejad, MJohnston, LGKendall, CKerr, LRFSRifkin, MRRutherford, GWUsing respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: a systematic reviewAIDS and BehaviorAIDS and Behavior105-130Volume 12S12008file:///C:/users/rw/Papers/Malekinejad%20RDS%20systematic%20review%20AIDS%20behav%202008.pdf5).
Recruitment patterns, sample proportions, RDS-1 and RDS-2 estimates and 95% confidence intervals
Current respondent-driven sampling definitions and the statistical inference methods employed by RDSAT were used. ADDIN EN.CITE ADDIN EN.CITE.DATA HYPERLINK \l "_ENREF_13" \o "Heckathorn, 1997 #6995" 13, HYPERLINK \l "_ENREF_14" \o "Heckathorn, 2002 #6994" 14, HYPERLINK \l "_ENREF_16" \o "Heckathorn, 2007 #7042" 16-18 Sample proportions were calculated excluding seeds. Respondent-driven sampling transition probabilities were calculated as the proportion of each sub-groups recruits who were in each subgroup e.g. proportion of all the recruits of Catholics, who were Protestant. HYPERLINK \l "_ENREF_14" \o "Heckathorn, 2002 #6994" ADDIN EN.CITE Heckathorn200269946994699417Heckathorn, Douglas D.Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden PopulationsSocial Problems11-344912002http://caliber.ucpress.net/doi/abs/10.1525/sp.2002.49.1.11 file:///C:/users/rw/Papers/heckathorn,%20social%20problems,%202002.pdfdoi:10.1525/sp.2002.49.1.1114 Adjusted group network size was calculated by weighting individual network size by the inverse of the individuals network size, i.e. the respondent-driven sampling multiplicity estimate of group network size using RDSAT terminology. HYPERLINK \l "_ENREF_16" \o "Heckathorn, 2007 #7042" ADDIN EN.CITE Heckathorn200770427042704217Heckathorn, D. D.EXTENSIONS OF RESPONDENT-DRIVEN SAMPLING: ANALYZING CONTINUOUS VARIABLES AND CONTROLLING FOR DIFFERENTIAL RECRUITMENTSociological Methodology151-207371200716
RDS-1 estimates were calculated using RDSAT by solving the set of simultaneous linear equations relating (using respondent-driven sampling theory) estimated network size, estimated proportions and transition probabilities, using the least squares algorithm. HYPERLINK \l "_ENREF_14" \o "Heckathorn, 2002 #6994" ADDIN EN.CITE Heckathorn200269946994699417Heckathorn, Douglas D.Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden PopulationsSocial Problems11-344912002http://caliber.ucpress.net/doi/abs/10.1525/sp.2002.49.1.11 file:///C:/users/rw/Papers/heckathorn,%20social%20problems,%202002.pdfdoi:10.1525/sp.2002.49.1.1114 95% confidence intervals were generated using the modified bootstrap method employed by RDSAT that somewhat mimics the respondent-driven sampling recruitment method. HYPERLINK \l "_ENREF_17" \o "Salganik, 2006 #7015" ADDIN EN.CITE Salganik200670157015701517Salganik, M. J.Department of Sociology, 1180 Amsterdam Avenue, New York, NY 10027, USA. mjs2105@columbia.eduVariance estimation, design effects, and sample size calculations for respondent-driven samplingJ Urban Healthi98-112836 SupplAnalysis of VarianceData Collection/*methods*Epidemiologic Research DesignHumans*Sample Size*Sampling StudiesSexually Transmitted Diseases/*epidemiology/transmission2006Nov16937083http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16937083 file:///C:/users/rw/Papers/Salganik,%202006.pdf17 Using this method, for any characteristic, the sample is divided into groups based on which group recruited them e.g. recruited into 3 groups, those recruited by an HIV+, HIV- and HIV-unknown. HYPERLINK \l "_ENREF_17" \o "Salganik, 2006 #7015" ADDIN EN.CITE Salganik200670157015701517Salganik, M. J.Department of Sociology, 1180 Amsterdam Avenue, New York, NY 10027, USA. mjs2105@columbia.eduVariance estimation, design effects, and sample size calculations for respondent-driven samplingJ Urban Healthi98-112836 SupplAnalysis of VarianceData Collection/*methods*Epidemiologic Research DesignHumans*Sample Size*Sampling StudiesSexually Transmitted Diseases/*epidemiology/transmission2006Nov16937083http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16937083 file:///C:/users/rw/Papers/Salganik,%202006.pdf17 The seed is then chosen with uniform probability from the entire sample, eg an HIV+ seed. The next person is selected from the group that was recruited by people in the same group as the seed, eg in this example, by HIV+ people. If this new person was HIV- then the next person would be recruited from the group who were recruited by HIV- people, and so on. This continued until the bootstrap sample was the same size as the original sample, and the respondent-driven sampling estimator is applied to the bootstrap sample. For each bootstrap sample, RDS-1 estimates were calculated. The 2.5% and 97.5% percentiles of 20,000 bootstrap samples were used to construct 95% confidence intervals.
Root mean squared errors were calculated for the difference between the population proportions and the full and small sample proportions, and for the difference between the population proportions and the RDS-1 and RDS-2 estimates, for each variable and in total. As RDS-1 estimates could not be calculated for the variable village using the small sample, village was not included in the total root mean squared error for RDS-1 using the small sample. Therefore, the total root mean squared error for the small sample proportions was calculated twice: including village (to allow comparison with the total RDS-2 root mean squared error) and excluding village (to allow valid comparison with the total RDS-1 root mean squared error).
The RDS-2 point estimator weights individual-level data by the reciprocal of their reported network size, to adjust for expected over-recruitment of large-network size individuals. HYPERLINK \l "_ENREF_18" \o "Volz, 2008 #6996" ADDIN EN.CITE Volz200869966996699617Volz, EHeckathorn, DDProbability Based Estimation Theory for Respondent Driven SamplingJournal of Official Statistics79-972412008file:///C:/users/rw/Papers/voltz,%202008.pdf18 RDS-2 point estimates were calculated, excluding seeds, using R. 95% CIs were estimated using the method described above, with RDS-2 estimates (instead of RDS-1 estimates) calculated for each bootstrap sample.
For comparison with the RDS-1 and RDS-2 estimates, we calculated recruitment probabilities for the target population, including seeds, using predictions from a logistic regression model HYPERLINK \l "_ENREF_19" \o "Kirkwood, 2003 #7625" ADDIN EN.CITE Kirkwood20037625762576256Kirkwood, B.R.Sterne, J.A.C.Essential medical statistics2003Wiley-Blackwell086542871919 as weights. The outcome was recruitment into full sample for estimates using data from full sample, and outcome was recruitment into small sample for estimates using data from small sample. Variables were included if they were significant at the 95% confidence level.
Two methods were used to determine whether equilibrium had been reached. The first was based on methods employed by RDSAT. ADDIN EN.CITE ADDIN EN.CITE.DATA HYPERLINK \l "_ENREF_13" \o "Heckathorn, 1997 #6995" 13, HYPERLINK \l "_ENREF_14" \o "Heckathorn, 2002 #6994" 14, HYPERLINK \l "_ENREF_16" \o "Heckathorn, 2007 #7042" 16 This method simulates recruitment for a hypothetical sample, assuming that all of the seeds were homogeneous for a variable and using the sample recruitment probabilities to calculate the expected sample proportions in each wave. The numbers of waves required to reach equilibrium for each variable was calculated from this as the number of waves it takes for the proportions in each wave to change by less than 2% relative to the proportions in the wave before. This differed depending on the subgroup chosen for the initial seed and therefore the largest number of waves required was reported. Limitations of this method are that it does not take into account random variation in recruitment or the actual sample proportions by wave. The second method was to calculate recruitment weights as the ratio of the equilibrium proportions to the sample proportions (excluding seeds) for each group. HYPERLINK \l "_ENREF_20" \o "Frost, 2006 #7012" ADDIN EN.CITE ADDIN EN.CITE.DATA 20 Equilibrium proportions are calculated by simulating recruitment using the sample recruitment probabilities. Recruitment weights that are far from one suggest that the sample has not reached equilibrium for that group. Equilibrium was assumed to have been reached if the ratio was within the range 0.90 to 1.10.
The mixing pattern between population sub-groups was summarised using the respondent-driven sampling measure Homophily. HYPERLINK \l "_ENREF_14" \o "Heckathorn, 2002 #6994" ADDIN EN.CITE Heckathorn20026994`, Equation 196994699417Heckathorn, Douglas D.Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden PopulationsSocial Problems11-344912002http://caliber.ucpress.net/doi/abs/10.1525/sp.2002.49.1.11 file:///C:/users/rw/Papers/heckathorn,%20social%20problems,%202002.pdfdoi:10.1525/sp.2002.49.1.1114, Equation 19 Homophily (H) was defined to be equal to one if all the recruits of that group were within that group, equal to minus one if all the recruits of that gro u p w e r e o u t s i d e t h a t g r o u p , a n d e q u a l t o z e r o i f t h e p r o p o r t i o n o f r e c r u i t s o f t h a t g r o u p w a s e q u a l t o t h e R D S - 1 e s t i m a t e o f t h a t g r o u p . O u r ( a r b i t r a r y ) c u t o f f o f f o r h i g h o r l o w w i t h i n - g r o u p r e c r u i t m e n t w a s H e" 0 . 1 o r H d" - 0 . 1 , a m o n g g r o u p s o f s i z e > 2 5 . T o t e s t t h e r e s p o n d e n t - d r i v e n s a m p l i n g a s s u m p t i o n t h a t r e c r u i t m e n t i s r a n d o m f r o m t h e r e c r u i t e r s r e p o r t e d n e t w o r k , e x p e c t e d r e c r u i t m e n t m a t r i c e s w e r e c a l c u l a t e d f o r e a c h v a r i a b l e u s i n g d a t a c o l l e c t e d i n r e c r u i t e r s f i r s t i n t e r v i e w o n i d e n t i f i e d i n d i v i d u a l-level network members who were a member of the target population. The data were weighted by the number of recruits of the recruiter (ie data from recruiters who recruited three recruits were given three times the weight of data from recruiters who only recruited one recruit). Age groups 0-19 and 20-29 were grouped and the category Other known/none/unknown was excluded for religion due to zero values in the expected recruitment matrices. The expected recruitment matrices were compared with the actual recruitment matrices and a chi-squared test was used to test for evidence against random recruitment from the recruiters reported network.
We explored the robustness of our results to any bias in network size estimates caused by under-reporting by re-calculating RDS-1 and RDS-2 estimates for the full sample using network size data from subsets of the sample that were less likely to have been affected by this potential source of bias. These subsets were: 1) Men recruited during the first five weeks of the study (mean network size fell slightly between weeks 5 and 8), 2) Men interviewed at interview sites 1 and 3 only (qualitative data showed that staff at interview site 2 unofficially started requiring their respondents to give at least 10 contacts in response to perceived reductions in reported network size), and 3) Men who responded to the respondent-driven sampling interview question How did your recruiter persuade you to come today? by saying that their recruiters had told them nothing about the study, or had told them only about the incentives. There were no recruits from the subgroups age <20 years and religion Other/none who reported that they had been given no information about the interview by their recruiters. The mean network sizes for these subgroups were therefore calculated from the reported network sizes of all recruits for subset 3. Estimates were not calculated for the variable village due to the high proportion of villages with few or no recruits meeting the requirements for subsets 1, 2 and 3.
To test for the possibility that biases in the unadjusted and adjusted estimates for the variable socio-economic status were due to an association between socioeconomic status and age and biases in recruitment by age, unadjusted and adjusted estimates for socioeconomic status were calculated separately by age group. Age group 0-19 and 29-29 were grouped due to the small number of recruits aged 0-19. Combined estimates were produced by combining the estimates by age group, weighted according to the population proportions in each age group.
Spatial analysis
Geographic plots were performed in ArcGIS 9.2 HYPERLINK \l "_ENREF_21" \o "Environmental Systems Research Institute, #7358" ADDIN EN.CITE Environmental Systems Research Institute73587358735842339Environmental Systems Research Institute,ArcGIS. Version 9.2Redlands, CA21 and distances between villages were calculated using ArcMap as the minimum distance between the main village meeting points along well established paths and roads.
Simple random sample of non- respondent-driven sampling -recruits
To compare network size of the whole target population to the respondent-driven sampling recruits, 300 men in the target population who had not been recruited in the respondent-driven sampling study, were randomly selected to be interviewed using the first respondent-driven sampling questionnaire. The size of the eligible population was 1475 (ie 2402 927 (the number recruited by respondent-driven sampling). The T-test was used to test for differences between means.
A minimum estimate for the proportion of the target population that were in a single connected network was estimated by calculating the proportion of the target population who were given as a contact by at least one respondent-driven sampling recruit or by at least one member of the simple random sample who was given as a contact by a respondent-driven sampling recruit.
Qualitative survey
To help understand the quantitative study findings 54 members of the population in the study villages or Medical Research Council staff were selected for qualitative interview. The groups sampled, sample sizes, and sampling methods were 1) 10 respondent-driven sampling recruits were randomly selected from 917 eligible (excluding seeds), 2) 10 men who were reported by recruiters as having refused coupons and we knew had not enrolled in the respondent-driven sampling study (refusers) were randomly selected from 29 eligible, 3) 10 community members (men and women) who were not respondent-driven sampling recruits or refusers were randomly selected from 8695 eligible, 4) 10 key informants from the study population were selected purposively, 5) all 10 respondent-driven sampling interviewers were selected for interview, 6) 2 general population cohort census survey staff were randomly selected from 8 eligible, 7) 2 general population cohort medical survey staff were randomly selected from 17 eligible.
Ethical approval
The Science and Ethics Committee of the Uganda Virus Research Institute (GC/l27109108), the Uganda National Council for Science and Technology (SS2278) and the London School of Hygiene and Tropical Medicine Ethics Committee (5585) gave ethical approval for the study.
Supporting Results
Seed selection
All a-priori seed selection criteria (assuming limited knowledge) were met (e Table 1). Two seeds were selected from each age and tribe group. The geographic distribution was slightly more uneven than expected when GPS data were used to examine the actual position of seed households (main paper Figure 1, seeds shown as black triangles).
Simple random sample survey
1475 (2402 - 927) men were eligible for the simple random sample. 55% (164/300) completed the interview. The reasons for non-interview are shown in supporting e Table 10.
Qualitative survey
54 members of the population in the study villages or Medical Research Council staff were selected for qualitative interview. 53 were interviewed consisting of 10 out of 10 respondent-driven sampling recruits, 10 out of 10 men who were offered coupons but did not enrol in the respondent-driven sampling study (refusers), 10 out of 10 community members (men and women) who were not RDS recruits or refusers, 10 out of 10 key informants, 9 out of 10 respondent-driven sampling interviewers (refusal due to being too busy), 2 out of 2 general population cohort census survey staff, and 2 out of 2 general population cohort medical survey staff. During analysis four refusers were found to have been ineligible and their data were removed from the analysis leaving six valid interviews from this group. The final sample size was 49.
Recruitment pattern
A video illustrating recruitment in space and time is shown in Video1.avi. There was very strong evidence against random recruitment from reported contacts by age (p<0.001) (e Table 5). Compared to reported contacts, younger men were over-recruited. This is likely to be due a bias against reporting young men to be household heads, rather than due to a genuine over-recruitment of younger men, as younger men were under-represented in the respondent-driven sampling sample. There was strong evidence that recruitment was not random by tribe (p<0.001), with a tendency for tribes that made up a smaller proportion of the eligible population to over-recruit from their own tribe by a larger amount (Kiga by 300%, Rundi by 67%, Rwanda/kole by 17%, and, in contrast Ganda under-recruited from their own tribe by 6%). There was good evidence against random recruitment by religion (p=0.01), due largely to an over-recruitment of Protestants by Muslims. There was strong evidence that recruiters did not recruit randomly by village (p<0.001) (e Table 6). 11 out of 25 villages over-recruited from their own village. Recruiters in villages with a larger number of eligible villages within 3km tended to over-recruit less (correlation of -0.42, p=0.04, supporting Figure S5). Most recruits were recruited by recruiters who lived in the same village (70.6%). 24% were recruited by recruiters who lived in villages within 3km of their village. 5% were recruited by recruiters living in villages more than 3km from their village. A map and recruitment networks showing the recruitment pattern by village are shown in supporting Figure S6. A recruitment network showing whether they were offered and accepted coupons is shown in supporting Figure S7. There was very strong evidence against random recruitment by socioeconomic status (p<0.001) with men in the lowest two socioeconomic groups being over-recruited and men in the highest two groups (and men of unknown socioeconomic status) being under-recruited. The over/under-recruitment was greatest for men in the highest and lowest socioeconomic groups and for men of unknown socioeconomic status. There was very strong evidence against random recruitment by number of sexual partners (p<0.001), due largely to under-recruitment of people with unknown numbers of partners and over-recruitment of people with zero sexual partners. The over-recruitment may be due to over-recruitment of older men as a higher proportion of older men reported zero sexual partners compared to younger men (23% of 50+ year olds compared to 6% of <50 year olds). There was no evidence against random recruitment for HIV (p=0.1)
Comparison with target population data
The root mean squared error for the difference between the true population proportions and the respondent-driven sampling estimates was 6.9% for the RDS-1 estimates and 6.6% for the RDS-2 estimates for the full sample and 7.4% for the RDS-1 and RDS-2 estimates for the small sample (e Table 7). The root mean error was largest for the variable HIV status for both estimators and sample sizes. It was smallest for religion for the RDS-1 estimates using the full sample and tribe using the small sample, and for village for the RDS-2 estimates using both sample sizes.
Sensitivity to different network size definitions
The RDS-1 adjusted estimates were closer to the true population proportions than the sample proportions were for 36% (19 out of 52) categories for network size definition NS-1, 33% (17 out of 52) for definition NS-4 and 35% (18 out of 52) for definition NS-5 for the full sample, and for 27% (7 out of 26) categories for definition NS-1, 35% (9 out of 26) for definition NS-4 and 39% (10 out of 26) for definition NS-5 for the smaller sample. The RDS-2 adjusted estimates were closer for 33% (17 out of 52) categories for network size definitions NS-1, NS-4, and NS-5 for the full sample, and for 35% (18 out of 52) categories for definition NS-1 and for 31% (16 out of 52) for definitions NS-4 and NS-5 for the smaller sample (supporting e Table 11).
Sensitivity of our results to potential bias in network size estimates
Mean network size rose slightly from 11.8 in week one to 13.8 in week five and subsequently fell slightly to 10.3 in week 8. There was very strong evidence for higher mean network size among men interviewed at interview sites 1 and 3 than site 2 (12.8 vs 11.0 p<0.001). There was very strong evidence that a higher proportion of recruits reported a network size of exactly 10 at interview site 2 than at sites 1 and 3 (28% vs 15%, p<0.001). There was weak evidence for a slightly higher mean network size among recruits whose recruiters told them that 'there would be questions' than among recruits whose recruiters had told them nothing about the study, or had told them only about the incentives (12.5 vs 11.7 p = 0.1).
The RDS-1 and RDS-2 estimates for the full sample generated using mean network sizes calculated from subsets of the samples were generally slightly worse than the estimates calculated using network size data from the whole sample (not shown). The exceptions to this were RDS-1 estimates calculated using network size data excluding site 2 and RDS-2 estimates calculated using network size data from the first 5 weeks of the study. In both cases, the RDS estimate was improved for just over half of the estimates (56%, 15/27) by using the subset network size data rather than the whole sample. However there was no evidence that this small improvement was significantly larger than 50% at the 95% confidence level (p=0.6). The other estimates were closer for 33% to 41% (9 to 11 out of 27) of subgroups. This may be due to chance (p=0.08-0.3), or it may be due to the fact that the average network sizes were calculated from fewer observations and were therefore more variable, making the average size of the RDS adjustments larger.
Socio-economic status by age group
40% (2 out of 5) sample proportions for socio-economic status were closer to the true population proportions after controlling for age group (e Table 12). After controlling for age group, 100% (5 out of 5) RDS-1 estimates were closer to the true population proportions than the non-age-adjusted RDS-1 estimates were. 40% (2 out of 5) were closer than the age-adjusted sample proportions were. The under-representation of men in the highest socio-economic status group and over-representation of men in the lowest group in both the sample proportions and the RDS-1 estimates remained after adjusting for age group ([population proportion, age-adjusted sample proportion, age-adjusted RDS-1 estimate], highest socio-economic group [26%, 18%, 18%], lowest socio-economic group [21%, 26%, 30%]).
Comment on number of men who were reported to have accepted more than one coupon
Analysis of the data on identified individual-level network members collected from recruiters who had returned for the second interview, showed 92 men had accepted coupons from more than one recruiters (84 from two, seven from three, and one from four). As only 16 men were found to be ineligible due to previous recruitment the majority of these men did not attempt re-recruitment. It is likely that more people in the target population accepted coupons from more than one recruiter because only 66% of recruiters returned for a follow up interview and only 68% of the people in the target population who were given coupons by these recruiters were identified.
Equilibrium
Using the method employed by RDSAT, for both sample sizes the number of waves required to reach equilibrium was calculated as four for socio-economic status and five for religion and at least 500 for village when the full sample was used (supporting e Table 3 and e Table 4). The estimated number of waves differed between the full and small sample size for HIV (three for full and four for small), age group (four for full and three for small), tribe (five for full and seven for small) and number of sexual partners (three for full and four for small). The difference between the values obtained using the two different sample sizes shows one of the problems with this method. There were 16 waves of recruitment in the full sample and 6 waves in the smaller sample and therefore using this method suggests that equilibrium was reached for all variables except village for both sample sizes and possibly tribe for the small sample.
Using the second method, recruitment weights for the full sample ranged between 0.93 and 1.01 for tribe, 0.99 and 1.05 for religion, 1.00 and 1.01 for socioeconomic status, 0.94 and 1.02 for age group, 0.03 and 6.01 for village, 0.97 and 1.01 for HIV status, and 1.00 and 1.00 for number of sexual partners (supporting e Table 3 and e Table 4). For the smaller sample they ranged between 0.62 and 1.08 for tribe, 0.99 and 1.05 for religion, 0.97 and 1.04 for socioeconomic status, 0.98 and 1.03 for age group, 0.00 and 13.13 for village, 0.93 and 1.02 for HIV status, and 0.97 and 1.02 for number of sexual partners. This suggests that equilibrium may not have been reached for tribe or village for either sample size.
Respondents all linked in single network
The recruitment networks from each seed were all linked to the same overall network and 73% of the eligible population were linked in a single network. This was likely to be an underestimate as network membership data were unavailable on many members of the target population and also because younger household heads tended not to be perceived as household heads by the target population (only 21% of eligible 0-19 years olds and 54% of eligible 20-29 years olds could be linked to the network compared to 79% of eligible 30+ year olds).
References
ADDIN EN.REFLIST 1. Shafer LA, Biraro S, Nakiyingi-Miiro J, Kamali A, Ssematimba D, Ouma J, Ojwiya A, Hughes P, Van der Paal L, Whitworth J, Opio A, Grosskurth H. HIV prevalence and incidence are no longer falling in southwest Uganda: evidence from a rural population cohort 1989-2005. AIDS 2008;22(13):1641-9.
2. Kamali A, Carpenter LM, Whitworth JA, Pool R, Ruberantwari A, Ojwiya A. Seven-year trends in HIV-1 infection rates, and changes in sexual behaviour, among adults in rural Uganda. AIDS 2000;14(4):427-34.
3. Nakibinge S, Maher D, Katende J, Kamali A, Grosskurth H, Seeley J. Community engagement in health research: two decades of experience from a research project on HIV in rural Uganda. Trop Med Int Health 2009;14(2):190-5.
4. Mbulaiteye SM, Mahe C, Whitworth JA, Ruberantwari A, Nakiyingi JS, Ojwiya A, Kamali A. Declining HIV-1 incidence and associated prevalence over 10 years in a rural population in south-west Uganda: a cohort study. Lancet 2002;360(9326):41-6.
5. Malekinejad M, Johnston L, Kendall C, Kerr L, Rifkin M, Rutherford G. Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: a systematic review. AIDS and Behavior 2008;Volume 12(S1):105-130.
6. Microsoft Corporation. Microsoft Access 2003. 2003 ed. Washington, 2003.
7. McCarty C, Killworth PD, Bernard HR, Johnsen EC, Shelley GA. Comparing two methods for estimating network size. Human Organization 2001;60(1):28-39.
8. McCormick T, Salganik M, Zheng T. How many people do you know?: Efficiently estimating personal network size. Journal of the American Statistical Association 2010;105(489):59-70.
9. StataCorp. Stata Statistical Software: Release 11.0. 9 ed. College Station, Texas: Stata Press, 2010.
10. R Development Core Team. R language and environment for statistical computing and graphics Vienna, Austria: R Foundation for Statistical Computing, HYPERLINK "http://www.R-project.org." http://www.R-project.org., 2010.
11. Gansner ER, North SC. An open graph visualization system and its applications to software engineering. Softw. Pract. Exper 1999;S1:1-5.
12. Salganik MJ, Heckathorn DD. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling. Sociological Methodology 2004;34(1):193-240.
13. Heckathorn DD. Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems 1997;44(2):174-199.
14. Heckathorn DD. Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations. Social Problems 2002;49(1):11-34.
15. Volz E, Wejnert C, Deganii l, Heckathorn D. Respondent-Driven Sampling Analysis Tool (RDSAT). 6.0.1 ed. Ithaca, NY: Cornell University, 2007.
16. Heckathorn DD. EXTENSIONS OF RESPONDENT-DRIVEN SAMPLING: ANALYZING CONTINUOUS VARIABLES AND CONTROLLING FOR DIFFERENTIAL RECRUITMENT. Sociological Methodology 2007;37(1):151-207.
17. Salganik MJ. Variance estimation, design effects, and sample size calculations for respondent-driven sampling. J Urban Health 2006;83(6 Suppl):i98-112.
18. Volz E, Heckathorn D. Probability Based Estimation Theory for Respondent Driven Sampling. Journal of Official Statistics 2008;24(1):79-97.
19. Kirkwood BR, Sterne JAC. Essential medical statistics Wiley-Blackwell, 2003.
20. Frost SD, Brouwer KC, Firestone Cruz MA, Ramos R, Ramos ME, Lozada RM, Magis-Rodriguez C, Strathdee SA. Respondent-driven sampling of injection drug users in two U.S.-Mexico border cities: recruitment dynamics and impact on estimates of HIV and syphilis prevalence. J Urban Health 2006;83(6 Suppl):i83-97.
21. Environmental Systems Research Institute. ArcGIS. Version 9.2. Redlands, CA.
eTable 1 Characteristics and recruitment patterns of the ten seeds. HIV status and sexual activity omitted for confidentiality
eTable 2 The correlation coefficients among RDS recruits between the five measures of network used in the study (including seeds). p<0.0001 in all cases
123450.7540.7780.8800.90440.8210.8500.95830.8400.88620.963
eTable 3 Recruitment matrices and other characteristics of the RDS sample for age, tribe, religion, socioeconomic status, sexual activity and HIV status, for the full and small sample. Table shows sample proportions, equilibrium proportions, recruitment weights, unadjusted and adjusted network sizes, homophily and wave at which equilibrium was estimated to have been reached using the RDSAT method. Recruitment weights (indicating equilibrium had been reached using the Frost Method HYPERLINK \l "_ENREF_20" \o "Frost, 2006 #7012" ADDIN EN.CITE ADDIN EN.CITE.DATA 20) are shown in bold if they lie between 0.90 and 1.10. Sample size for age group is 238 rather than 240 because the two seeds in age group 0-19 were excluded to allow estimates to be calculated.
SHAPE \* MERGEFORMAT eTable 4 Recruitment matrices for the characteristics village; equilibrium distributions, recruitment weights, network sizes and homophily of each group. - = could not be calculated.
eTable 5 Observed and expected recruitment matrices. Expected recruitment matrices were calculated from data on identified individual-level network members. P-values are calculated using a chi-squared test and indicate the strength of evidence against random recruitment. Category Other known/none/unknown was excluded for religion due to zero values in the expected recruitment matrices.
SHAPE \* MERGEFORMAT eTable 6. Observed and expected recruitment from own vs. other village. Expected recruitment was calculated from data on identified individual-level network members. P-values are calculated using a chi-squared test and indicate the strength of evidence against random recruitment
eTable 7 Root mean squared error for the difference between the true population proportions and the sample proportions and RDS estimates
- indicated that the RDS estimates could not be calculated
Full sampleSmall sampleSampleRDS1RDS2SampleRDS1RDS2Age group (years)4.99%5.60%5.77%6.16%6.99%6.90%Tribe2.20%2.79%2.63%2.99%2.52%3.17%Religion1.81%2.51%2.88%8.35%8.72%9.51%Socio-economic status4.72%6.00%5.54%3.17%4.35%4.51%Village1.75%3.26%1.95%3.96%-4.26%Number of sex partners in the last year12.10%12.32%12.15%11.27%11.73%11.46%HIV status18.40%18.54%18.42%17.89%18.58%18.42%Total6.35%6.87%6.56%7.00% (8.88% excluding village)-
7.40%7.44%
eTable 8 Target population proportions, full and small sample proportions, and regression-weight adjusted estimates with 95% confidence intervals (CIs). Regression-weight adjusted point estimates are shown in bold if they are closer to the target population proportions than the unadjusted sample proportions. CIs are shown in bold if they include the population proportion. -' = could not be calculated. Full sample regression model included all variables shown except religion. Small sample model regression model included all variables except religion, village, and socioeconomic status. Village was excluded from the small sample regression model because no-one was recruited from two villages in the small sample and therefore everyone in those villages would have been excluded from the regression model if it had been included.
SHAPE \* MERGEFORMAT
eTable 9 Assumed (limited) prior information on target population (male household heads) (left) and a-priori desired characteristics of the ten seeds (right). Village names removed for confidentiality.
Assumed (limited) prior knowledgeA-priori desired characteristics of seedsGeographic distributionMap used by Medical Research Council mapper
One seed from within each of the following ten areas
AgeInformation from a Medical Research Council staff member working with study villagers: Most male household heads aged about 25 to 50 years. Min about 18 years. Max 70+ years.10-19 yrs
2
20-29 yrs
2
30-39 yrs
2
40-49 yrs
2
50+ yrs
2
TribeInformation from a Medical Research Council staff member working with study villagers: Most common tribe is Ganda followed by Rwanda/kole. There are also Kiga, Rundi and other tribes in the areaGanda
2
Rwanda/kole
2
Kiga
2
Rundi
2
Other known tribe
2
eTable 10 Reasons for non-interview in simple random sample survey
Away5943.4%Refused2619.1%Couldn't find2014.7%Died85.9%Health42.9%Other1914.0%136100.0%
eTable 11 Percentage of categories for which the RDS adjustments improve the estimates of the population proportions using different measures of network size. -, could not be calculated.
RDS-1RDS-2VariableFull sampleSmall sampleFull sampleSmall sampleNS-1NS-4NS-5NS-1NS-4NS-5NS-1NS-4NS-5NS-1NS-4NS-5Age group40.0
(2/5)60.0
(3/5)40.0
(2/5)0.0
(0/4)40.0
(2/4)20.0
(1/4)40.0
(2/5)40.0
(2/5)40.0
(2/5)20.0
(1/4)40.0
(2/4)40.0
(2/4)Tribe20.0
(1/5)20.0
(1/5)20.0
(1/5)60
(3/5)60
(3/5)80
(4/5)0.0
(0/0)0.0
(0/0)0.0
(0/0)40.0
(2/5)40.0
(2/5)40.0
(2/5)Religion25.0
(1/4)25.0
(1/4)0.0
(0/4)50.0
(2/4)50.0
(2/4)50.0
(2/4)25.0
(1/4)25.0
(1/4)25.0
(1/4)25.0
(1/4)25.0
(1/4)25.0
(1/4)SES20.0
(1/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)0.0
(0/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)Village40.0
(10/25)36.0
(9/25)40.0
(10/25)---36.0
(9/25)36.0
(9/25)36.0
(9/25)40.0
(10/23)32.0
(8/23)32.0
(8/23)HIV status33.3
(1/3)66.7
(2/3)33.3
(1/3)0.0
(0/3)33.3
(1/3)33.3
(1/3)66.7
(2/3)66.7
(2/3)66.7
(2/3)0.0
(0/3)0.0
(0/3)0.0
(0/3)Sexual partners60.0
(3/5)0.0
(0/5)60.0
(3/5)20.0
(1/5)20.0
(1/5)20.0
(1/5)40.0
(2/5)40.0
(2/5)40.0
(2/5)60.0
(3/5)40.0
(2/5)40.0
(2/5)Overall36.5
(19/52)32.7
(17/52)34.6
(18/52)26.9
(7/26)34.5
(9/26)38.5
(10/26)32.7
(17/52)32.7
(17/52)32.7
(17/52)34.6
(18/52)30.8
(16/52)30.8
(16/52)
eTable 12 Socioeconomic status results controlling for age. RDS-1 estimates are shown in bold if they are closer to the population proportions than the sample proportions.
Age group (years)SESPopulation proportionsSample proportionsRDS-1 estimates0-29Highest0.2180.1590.178Higher0.2370.2460.215Lower0.2370.2860.336Lowest0.2070.2140.224Unknown0.1020.0950.04730-39Highest0.2890.1970.198Higher0.2500.2270.266Lower0.2440.3230.290Lowest0.1650.2180.234Unknown0.0520.0350.01240-49Highest0.2920.2140.254Higher0.2820.2680.219Lower0.2120.2550.237Lowest0.1830.2360.263Unknown0.0300.0270.02650+Highest0.2310.1520.109Higher0.2320.2340.223Lower0.2210.2510.201Lowest0.2860.3360.448Unknown0.0290.0260.019CombinedHighest0.2570.1780.179Higher0.2490.2420.232Lower0.2290.2790.263Lowest0.2140.2560.302Unknown0.0520.0440.025
eFigure 1 Summary of reported network size of RDS recruits (excluding seeds)
eFigure 2 The distribution of network size, by definition (including seeds)
eFigure 3 The distribution of network size among the target population. Men recruited into the RDS study are shown in black Network size definiton used was NS-1. Recruits had a mean network size of 12.1 (based on 917 observations) and non-recruits 7.4 (162). The estimated mean network size in the whole target population was 9.2.
eFigure 4 The number of times members of the target population were identified as contacts by other recruits
eFigure 5 Proportion recruits over-recruited from their own village, by number of villages within 3km of a village. Network size definition NS-5.
eFigure 6. Pattern of recruitment, by village and HIV status. Map (left): symbols show the location of recruits houses and colours indicate the recruiters villages. Circles indicate that the recruit and recruiter were from the same village and triangles indicate that they were from different villages. Recruitment networks (right): The colour of the symbol indicates the recruits village and the shape their HIV status (triangle=HIV positive, circle=HIV negative, square=HIV status unknown/not shown for seeds).
eFigure 6
eFigure 7 Recruitment networks, by seed. Seeds are shown at the top of each recruitment network. Symbol area is proportional to network size. Symbol shading indicates week of recruitment (darkest = earliest). Symbol shape indicates whether the recruit was not offered coupons (square), was offered coupons but did not accept them (triangles), or was offered and accepted coupons (circles).
D F W Y p i j k l ~ F
G
H
W
X
l
m
n
o
p
q
ﻶwoko] hQ hQ H*mH nH u hpr j hpr Uj hy2 h Uj h Uj h UhQ j hy2 h Uhy2 hvFW H*hy2 hvFW h* 6hy2 hvFW 6hvFW 5\ h: h: \hK hvFW \ h* \h@W h@W \ha_ hvFW \h* 5\ ht hvFW 5\ E F X Y j l 6 7 #% $% ' ' ' ) ) $
. d a$ gdwA $
. d a$ $
. d a$ gd $
. 0d ^`0a$ gdwA
. ~ / 0 1 2 3 5 7 I c $ / H I Q # # # # # ڬڬ hwV \hL
a hvFW ht hvFW hQ j h Uh6j h* hK hvFW hy2 hvFW j hy2 h Uhpr j hpr UhQ hQ H*mH nH u j hpr H*UmH nH u :# $ $ $ $ $ % % % % % % % % !% "% $% &