Assessing the burden of AMR across the world and especially in low- and middle-income countries is challenged by gaps in prevalence and geographical distribution, mainly due to lack of surveillance infrastructure and technical expertise [22, 28,29,30]. In countries where data are available, AMR outcomes are typically pooled to achieve national or regional averages [6], however, given the great variation in quality and geographical spread between health facilities and laboratories, this average may not be representative of population estimates [31]. Inference on population parameters based on data from a population sample are generally accompanied by 95% CIs as a measure of uncertainty. For AMR data, most studies assume the data follows a binomial distribution, and each result is independent from all other results. However, in reality outcomes from one health facility may be more comparable to each other than to those from other centers, implying a degree of data correlation. Thus, the violation of the data independence assumption often leads to confidence interval estimates that are too narrow and unlikely to contain the true proportion of resistance in the population.
By randomly sampling facilities included in a large dataset of AMR data from the US, we demonstrate the bias in uncertainty measurements that is likely a characteristic of most AMR data that are collected from multiple facilities. At low resistance levels, the bias in these results lessened, but was not mitigated. However, at high rates, the impact was quite drastic—the coverage probability was only around 25% in the non-clustered methods compared with over 80% even with the fewest number of facilities and closer to 95% as the sample numbers increased in the clustered methods. When accounting for clustering, the confidence intervals were 7–11 times larger on average compared to the non-clustered methods. Moreover, the difference in confidence interval widths between non- and cluster methods increased as the number of sampled facilities decreased from 174 to 10. For instance, using the Wilson method and sampling from 174 facilities, representing the entire dataset for A. aureus, the average spread of the confidence intervals for Oxacillin-resistant isolates increased from 0.5 in the non-cluster methods to 5% points in the clustered methods; however, the average confidence interval spread increased from 2.6 to 14 percentage points, when sampling from 10 facilities. Widening of the confidence intervals widths as the number of the sampling facilities is reduced, illustrates the increasing uncertainty when sampling from few units with great heterogeneity in the proportion of resistant isolates. In such instances, reporting of AMR estimates from each cluster, may be more appropriate than aggregating the samples.
While increasing the number of facilities, and thus the number of isolates in the sample, reduced the bias (i.e., increased the coverage probability in most cases), this difference was only marginal and not sufficient to remove the bias introduced by the violation of the data independence assumption. In some instances, as in the case of K. pneumonia, the increase in the number of sampling facilities led to a reduction of coverage probability. This is likely due to high geographical heterogeneity as the survey method, which accounts for this type of sampling, did not show this same pattern. However, confidence intervals for small proportions of resistant isolates (equal or less than 0.01), were wider and were associated with better coverage probabilities, even when data correlation was not taken into consideration. This observation was in line with findings from a previous study demonstrating (through simulations with real pharmacological data), that the Wilson score with continuity correction was recommended as one of the methods for constructing confidence intervals for very small proportions ranging from 0.001 to 0.1 [32]. Overall though, AMR prevalence estimates derived from aggregated data should include stratification of samples according to their source or other shared qualities, whenever possible. When this information is not available, alternative methods should be evaluated to improve the estimates. For example, a bootstrap analysis that resamples results with random clustering. Future studies should evaluate methods to account for this uncertainty bias when the number of labs is less than ten.
The fact that employing cluster-robust errors instead of standard errors led to a significant improvement in coverage and widening of the confidence intervals, suggests that facility-level differences matter. There are several potential reasons for these differences. The first is that each represents geographical differences in resistance. There is some evidence that local patterns of resistance may be important in S. aureus [33]. The second is that there may be differences in practice patterns that determine patient culture probability. Variation in practice patterns are well documented in medicine [34,35,36]. For example, one study of blood culturing practices found wide variation in the rate of blood cultures per 1,000 patient days [37]. These variations in culturing practices could lead to large differences in resistance rate estimates. Finally, there may be differences in the quality of the culture or the laboratory. Contamination of samples with other organisms can affect resistance rates either by including estimates of organisms that are not clinically important or resulting in samples being rejected. In addition, the sensitivity and quality of laboratory instruments can vary widely. While this is likely less of a problem in high-income countries, there still remain differences in how samples are processed that could introduce biased differences. In many low-income countries however, resource constraints can result in drastic differences in laboratory quality.
The goal of this study was to assess the implications of the choice of methodology in CI construction. However, an important limitation of the robust error approach is that while it adjusts standard errors for correlated data, it has no impact on the AMR estimates themselves which are sensitive to data heterogeneity across the sampling facilities. This is especially important when there are significant differences in the number of samples processed in each facility, a scenario that would warrant the use of weighted analysis methods.
While we attempted to use representative and generalizable data to assess these methodologies, the dataset used in our simulations contains data solely from the US, where heterogeneity in data sources may be considerably lower than in the low- and middle-income countries. We used the TSN database though as it is one of the largest datasets available with high, representative coverage, which allowed us to simulate scenarios in which different numbers of facilities with different sample volumes and characteristics were chosen in the sampling frame. The results are important for estimating the burden of resistance in other settings, as they illustrate that even in settings with large geographic representation and high quality labs, large uncertainties remain in AMR estimates. Finally, we assumed that the entire dataset contained the “true” population mean—and estimated coverage probabilities were based on the mean value. While the dataset is large, it is itself a sample of the population. Biases in the dataset could constrain the implications of the results, but the larger point of the analysis is not affected by this limitation.