SoVI® Error Discussion
Error in SoVI® 2006-10
Error is inherent to any compound metric or index, and SoVI® 2006-10 is no exception. Unfortunately, the detailed computation procedures and compilation of data from multiple sources make the error calculation process for SoVI® 2006-10 prohibitive. Only 13 of the 30 variables in SoVI® 2006-10 contain margins of error suitable for computing a coefficient of variation. For this reason, we do not provide error calculations for SoVI® 2006-10. Instead, we offer the following steps illustrating how we computed error for the SoVI® 2005-09 metric, which is based primarily on American Community Survey data. Users should exercise caution when viewing SoVI® data in the highlighted counties. Data were found to be less reliable in these counties due to small populations of interest and a low sampling rate.
Error Calculation Process for SoVI® 2005-09 Using the American Community Survey
Step 1. Collect American Community Survey (ACS) data, including margins of error, for all variables used in the calculation of SoVI®.
Step 2. Calculate SoVI® 2005-09 (see SoVI® FAQ Page)
Step 3. Calculate standard error for all variables used to calculate SoVI®
Example: QPOVTY for Richland County, SC is derived as a percentage of the total population with income below the poverty level. Thus, calculate standard errors for both the numerator (# of people below poverty level) and denominator (total population).
Step 4. Calculate standard errors and margins of error for derived SoVI® variables. Instructions for calculating error for derived variables are available from ACS here:
Step 5. For each derived SoVI® variable, calculate the coefficient of variation (CV).
Example: QPOVTY for Richland County, SC is estimated to be 12.6598% with a margin of error of +/- 0.7359%. The coefficient of variation is: (0.7359/1.645) / 12.6598 = 0.0353 This represents the proportion of the estimate of QPOVTY that is due to sampling error.
Step 6. Compute the median CV across all derived SoVI® variables for each county. Due to the use of non-ACS data, imputation methods, and/or GIS operations in calculating some SoVI® variables, we were able to calculate margins of error and CVs for only 24 of the 31 total variables.
Example: The median CV across these 24 variables for Richland County, SC is 0.0243.
Step 7. Aggregate the median CVs for all counties into one national dataset and examine the shape of the distribution.
Step 8. Identify any extreme outliers in the distribution and trim them from the dataset. We removed three counties as extreme outliers: Kalawao, HI; King, TX; and Loving, TX.
Step 9. Plot a histogram of the remaining county median CVs. (The visualization below was created in JMP® 8.0.2).
Step 10. Fit a distribution to the data and obtain parameters.
We used the worksheet add-ins for Microsoft Excel available from a free trial version of EasyFit 5.5 to fit a gamma distribution to the data.
- Shape parameter: 2.3241
- Scale parameter: 0.03853
- Gamma parameter: 0.00335
Step 11. Calculate a gamma mean and gamma standard deviation.
(2.3241 * (0.03853^2)) ^ 0.5 = 0.058739
Step 12. Set a threshold of two standard deviations above the mean. Any counties with a median CV across the 24 derived SoVI® variables that exceeds this value are identified as having the highest uncertainty in SoVI® scores due to sampling error. (Note: The three outlier counties were added back into the dataset at this step and grouped with other counties displaying the highest uncertainty).
Step 13. Counties with the highest uncertainty in SoVI® score are shown in orange for the nation. This map is available here.
Additional Notes regarding ACS margins of error
• Where the variable estimates were zero, we used a value of one instead to allow calculation of margins of error and coefficients of variation. If zero is used, it is not possible to calculate a CV because a zero will reside in the denominator.
• Where margins of error were omitted from ACS data, we assigned a value of zero. ACS documentation indicated these estimates were calibrated to match other census data sources, thus making specification of margins of error impossible.