I have recently been working with some social scientists on estimating the prevalence of certain disabilities in the UK regions using census, health survey for england and R. One of the aims is to show the stats on a map. Edina’s thematic mapping service helps here. However I was having difficulty with inconsistencies in the UK district classifications. Why is the source data for each of the countries in the UK (available through edina again) published in a slightly different format:
COUNTY_CODE_2001 DISTRICT_CODE_2001 DISTRICT_NAME_2001
00 AA City of London
So, we have two in csv, one in tab, one which uses quotes around fields and headers, one which joins the codes into a 4 letter string and different orders in each file. How much time could we all save if the creators of such data talked to one another.