Secondary menu

Elections and Registration in Afghanistan (ERA) Project


Notes on Current Population Estimates by Province & District
ERA Topical Report #9


Thomas Eighmy
February 19, 2003

The attached Annex II has two levels of detail: province (Wilayet) and district (Woleswali). The former smoothes and aggregates the irregularities of the latter— irregularities that are evident from detailed examination of the spreadsheet, the existing map of districts, and the diverse sources utilized.

The purpose of the exercise is to assemble these sources in a form that is immediately accessible and comparable for the purpose of registration and election planning. Note that breakdown by current age and gender is not available. For very rough purposes, assume that the voting age population is about 50% of the population and that historically verifiable sex ratios were approximately 115 males per 100 females. Further fine tuning is not worth the effort given the questionable quality of the underlying raw population estimates. The 1/200 Central Statistics Office (CSO) pre-enumeration household survey should provide better age-sex data as well as substantially better total population figures at a district level and below when the fieldwork is complete.

However, several meetings with very competent CSO Afghan and advisory staff have been informative and have essentially resulted in a consensus to de-link registration from the pre-enumeration survey. Detailed results of the latter, leaked sequentially by province, will probably be substantially lower than the CSO's own estimates. This would raise difficult political issues deleterious to both exercises. Furthermore, the pre-census "knock on all doors" form identifies the names of heads of households, and the numbers, but not the names of individual potential registrants over age 18. These forms are useful for planning and fine tuning the registration process but, as noted previously, are not a substitute for it.

The CSO pre-enumeration survey results will refine the crude population estimates presented here, and this in itself is useful; but no further dependence of one on the other is needed given the time constraints on both the survey and registration exercises.

The CSO village facilities survey, with the Management Sciences for Health (MSH) geo-coded list of health centers as back up, will also provide a basis for selection of registration and polling places by district. Management Sciences for Health is one of the principal contractors in the health field. MSH geo-codes its health stations by specifying their latitudes and longitudes. This specificity makes them potential registration/polling stations.

Province Key

Sequence Number (1 to 32)

Order of Presentation:

Province Code and Name are also 1 (Kabul) to 32 (Khost) but in a different order than the simple sequence number. The numeric code is that adopted by the UN from earlier demographic survey data and published on the UN AIMS (Afghanistan Information Management System) map entitled "Afghanistan Administrative Divisions and Associated Geo-Codes, June 2002."

The order is intentionally chosen, starting at 01KABUL and circling through the country and ending at 02 KAPISA, such that each province is contiguous with (connected to) the provinces above and below it on the list. Provinces with similar human and physical geography, as much as possible, are grouped together rather than scattered across the map as would happen with a simple numeric or alphabetic arrangement, and "new provinces" are listed below or between the older provinces from which they were carved. (For example, 29 PAKTYKA and 32 KHOST are listed between 06 GHAZNI and 07 PAKTYA from which they were formed and with which they could be grouped in any future larger regional setting. The same is true of several other cases.)

Source and Number:

Central Statistics Office (CSO). These are the CSO, and hence official, population estimate figures. The CSO recognizes that these need refining. They were originally based upon the 1979 pre-census, but have lacked the comprehensive ground surveys to account for differential growth by district, and for refugee and IDP movements.

Centers for Disease Control (CDC). These assume complete immunization coverage of 0-5 year olds and that these represent a fixed 17% of the population, which is then extended to give total population estimates by reporting area (more or less, districts).

Oak Ridge Laboratories (ORL). These represent open source satellite data exploiting image footprints of differing land use intensity and light reflectivity within the bounded limits of districts. The number of recording units more closely matches the CSO numbers than do the CDC numbers. Ground truthing may be lacking.

The number of recording units for the CSO is the same or higher than the number of recording units for the other two sources. If the number of recording units for a province differs substantially from one source to another, it indicates possible double counting and complicates comparison of the different estimates, but only at the district level, hence the provincial level smoothing. For purposes of claiming political patronage or an increased share of resources, there has historically been consistent pressure to increase the number of provinces and districts, sometimes without removing the constituent area and populations in the new districts from the parent districts. There is also a common thought that more districts and provinces will equate to more political representation in future governments, regardless of disparities in population size or voter turnout. Depending upon the number of seats in any future national election, some small districts need to be grouped and larger ones subdivided to achieve constituencies of approximately equivalent size. This is not an impossible task.

District Key

This follows the province-level data. Density calculations use the ORL data for which area estimates are available. (These need to be checked against external sources.) Some districts, largely new ones, are indicated by zeros in the district code and population columns. This indicates missing data or lack of reporting units from one of the sources and hence the complications and possible double counting noted above.

Conclusions

The coefficients of determination (r squared) between the three sources, at the smoothed province-level data, are high:

    CSO CDC ORL
CSO   ----- 91.4% 89.6%
CDC     ----- 86.3%
ORL       -----

The fact that CDC and ORL achieve higher correlations with CSO than with each other gives some small credence to CSO as the best of the three measures.

At the district level, there are some wild fluctuations among sources and a number of closer fits for two sets and occasionally for all three sets (where reporting units are equal). This lends some credence to the diverse methods involved in deriving the resulting estimates. All estimates converge on a total population inside the borders of about 21 million. All estimates may be too high, even if the 1979 CSO base is correct, because of the unsustainably high growth rates employed. (The author, despite his own warnings of number inflation, has used the same high rates of 2.2% as indicated in the 1974 demographic survey and 2.4% from 1979 to 1990 and beyond as calculated by outside demographers. In retrospect, the 2.4% figure is too high and compounded for a number of years, results in over estimation. CSO uses 1.92%. Even this may overestimate current population inside the boundaries. IDPs and refugees add an additional layer of complication. True nomads, with no fixed part-time residence, are consistently over estimated.)

The CDC's assumptions of complete infant coverage may be questionable, and it seems immunizations may have been recorded where they were administered rather than where the child lived. ORL may underestimate urban populations (little agricultural land and few lights). CDC data has indirect evidence of border crossing from Pakistan. ORL data suggests some population padding in small districts as indicated by high density figures. Large populations in the small (in area) districts and high population densities in some border areas may indicate a combination of insufficient agricultural land to support the estimated populations, or population estimates being too high. The "fit" of different estimates seems slightly better in the north, except for a proliferation of new districts in Badakshan and Takhar.

Overall, fair registration and election systems will have to be designed around the apparent population data flaws. A potential fall-back position would be to rely on the 1979 figures, which are proportionally correct and represent the distribution of population before the disturbances of the last 23 years. This could easily be added to the data set.