A Short Tour Through Some of This Geographical Header



The main guide for the "geographic header" data provided by this package is the US Census Bureau's sf2.pdf file, which is available from this page. In addition, there is some helpful information on the Census' bureau's view of geography, which includes a pointer to more useful articles.

Chapter 6 of sf2.pdf is the "Data Dictionary". It lists the columns, and provides information on how to interpret each column. On the top of page 6-2, it lists SUMLEV, with an internal link to endnote 2.

Endnote 2, on page 6-15, contains some text, then contains an internal link to "How to Use This Product", Chapter 2 of sf2.pdf.

Chapter 2 has a section, on page 2-3, on "SUMMARY LEVEL SEQUENCE CHART", which includes, again, some text, and another internal link, this time to "The summary level sequence chart", which is to Chapter 4, "Summary Level Sequence Chart".

Pages 4-1 through 4-4 of Chapter 4 discuss "State Summary File 2". The data in this package is from the "National Summary File 2"; the description for this starts on page 4.5.

The National Summary File 2 appears to have two columns relevant to "summary level sequence". The first is "Geographical component" (GEOCOMP) (wait! We haven't seen that one yet! patience…). The second column is "Summary level", which turns out to be our old friend, SUMLEV. A SUMLEV of 000 (is that a string? an integer?) seems to have associated GEOCOMP values of:

00, 89–95,
A0–A2, C0–C2,
C7–CT, E0–E2,
E7–EJ, G0, H0

with a meaning of "United States". We tentatively conclude that this summary level relates to counts for the entire United States rather than, e.g., a value of 040, relating to (individual?) states.

NB: For reasons of space, this package only includes SUMLEV values 010, 020, 030, 040, and 050.


If we go back to page 6-2 of Chapter 6 ("Data Dictionary"), the variable ('column') is the most recently aforementioned GEOCOMP, "Geographical Component". And, this entry points at endnote 3.

Endnote 3, again on page 6-15, lists a number of GEOCOMP values (continuing on to page 6-18), and points again at Chapter 2, "How to Use This Product", for further information.


Once again we go back to page 6-2, where we see "Characteristic Iteration", CHARITER, which points us at endnote 4, as well as at Appendix H "for a full list of possible iterations" (implying maybe not all are used? At least, not all the time?).

Endnote 4, on page 6-18, gives some text, and also points at Appendix H.

Appendix H, "Characteristic Iterations", starts off promisingly

This appendix lists the 331 possible iterations for Summary File 2.

and, reading on, we see that it encodes demographic ('racial') slices through the US population (and that a value of 001 indicates the entire population).

Code book

Cornell's CISER helpfully provides an Excel spreadsheet 1 describing the columns in the sf2 file beyond the geographical header.

If we included all (3GB zipped, 16GB unzipped!) of sf2 2 rather than just the geographical header, we would have occasion to refer to the Code Book .xls file. But, we haven't included all.

In closing

Good luck with it!


I use gnumeric to view this file.
You can retrieve us2010.sf2.zip from this web page, should you have bandwidth and disk storage. Better, you can retrieve it from CISER, so you'll have, e.g., column names.