Behind the Census

I cannot tell you how good the just-released data from the augmented 2018 Population Census are. I have no doubt that the professional statisticians in Statistics New Zealand have done a very good job – much better than the generic managers. The statistical release was accompanied by the report of an external expert advisory group, in whom I also have great confidence, and with more documentation than usual, as befits the release of a new data base.

However, it takes years of use by professionals before they can be sure that any glitches in a new data base have been dealt with. (The dealing is done at a practical collegial level, so the public is usually not aware of the glitches; nothing secret about it, it is just the way colleagues work and if you hunt around you can usually find ithe details documented.)

Since we have had population censuses back to 1851, you may be surprised that it is a ‘new’ data base. However, this augmented census has been compiled in a different way from past ones (which were not without their glitches). To simplify, without going through the failures of the census enumeration:

Suppose you filled in your census form but failed to include your age. There exists a vast government data base known affectionately as ‘IDI’ – Integrated Data Infrastructure – containing microdata about people and households. The data comes from a range of government agencies, Statistics NZ surveys including the 2013 Census, and non-government organisations. The likelihood is that your age appears somewhere in the IDI – perhaps you answered in 2013 – in which case it has been inserted into the 2018 data.

Suppose you did not answer the census. The high likelihood is that you appear in a number of places in the IDI and its records can be amalgamated into a simulacrum of the census record you did not submit.

Thus many of the gaps – including almost all the missing people – can be filled. Some cannot. The most frequently mentioned is iwi membership. I’ve never used this data, so shall I skip to some I do know about.

An easy example is the response on religion. Because there is no need for public agencies to collect your religion, the IDI is almost silent (the 2013 Census excepted). So those who did not respond to the question or were not enumerated are unlikely to have a religion in their augmented census record. (Note that people change their minds about their religion so the 2013 record may be misleading in 2018.)

Does it matter? Those of us interested in changes in social and religious history have valued the religious allegiance data but given the reduction in formal religious attachment it may be not so important today. The only other use I have made of the religious response was in 1971 I was able to rank the religions by per capita income. The main religions – Anglican, Methodist, Presbyterian and Roman Catholic – were bunched together with atheists. Agnostics were above them; I leave you to explain. Not surprisingly the low-income religions had heavy Maori membership. (There were not a lot of Pasifika in that year.)

Broadly the same problem applies to the reliability of ethnic classification (but not to the Maori descent response). People change their ethnicity over time, and even for different purposes at any one time. There will be some records in the IDI but they may not be consistent. The external experts caution that ‘there is significant variability in the quality of ethnicity data by ethnic group’. Which is a bloody nuisance.

I have been following those with multiple ethnicities over the years. They exist in contradiction to the public rhetoric where we each person is treated as having a single ethnicity which is stable over time and circumstance. (When do you read a news report of a person with two ethnicities?)

Of particular interest to me is that about half of those who tick Maori ethnicity in the census also tick at least one other – typically European or Pakeha. When I compare the two groups on various socioeconomic measures – say, income – the scores were sufficiently different to suggest that there may be a distinct Maori-Pakeha ethnicity. The growth of this group is an indicator (but only one) of the ‘indigenisation’ of New Zealand culture. Frustratingly, we cannot rely on the 2018 Census to help us track one of the most important social changes going on in New Zealand.

The external experts also caution that ‘Household and Families data will generally be of low quality and will not enable comparisons with 2013.’ Bugger! In effect we now know less in 2018 about the fundamental social units of New Zealand than we knew in 2013.

There is an even more gloomy view that virtually all comparisons between 2013 and 2018 may not be very reliable, and that may also be true for 2018 and 2023 comparisons. A snapshot is only a snapshot; it is not a picture. The failure of the 2018 Census enumeration will be a problem for social statisticians and those interested in long-term social change for decades.

I have seen it argued that the gaps can be covered by sample surveys. But a sample survey needs a population frame to anchor it back to. Typically we get the frame from (or verify it against) a population census. For example, I was once working with the Household Survey and found a glitch in its weights (it overweighted solo parent households). Email to SNZ; their professionals checked my analysis and reweighted. These things happen; they get fixed.

What I am really cross about as far as the Census is concerned is the income question. Our longest personal income series is the one from Censuses back to 1926. Maori were excluded before 1951 (botheration) and the definition changes from market to all income in 1981. (This series does not give after-tax income which is what most of the public discussion is about; the best available series starts in 1982.) Fortunately there are overlaps in the change years so we can identify long-term trends. But there are also census-to-census variations around the trends.

The 2013 Census data was sufficiently off-trend to suggest there may have been a new development since 2006. Or perhaps it was a bigger-than-usual normal variation. I decided to wait for the 2018 Census. Except I shant be able to use it. The augmented 2018 Census income uses the income tax records and we know that in the past census-reported income did not match tax-reported income (I took this into consideration when I used the data). Stats NZ may be able to give an overlap by applying the tax records to the 2013 census. But we shant be able to say how successful it is until the 2023 census, which is a long wait since 2006. Sigh.

The purpose of this column is to give you confidence in the SNZ data base and the augmented 2018 Census by showing that the backroom professionals are much more competent than their generic managers. Is it not an irony that the professionals covered for the incompetence of the managers who gave them so little support?

In 2013 the State Services Commission appointed a generic manager with limited quantitative competence to a highly technical position which she proved incapable of managing. Has the SSC reviewed that 2013 appointment process? In particular, did it consult professionals or did it over-rely on generic managers who knew as little about statistics as the appointee. The fear is that the SSC will repeat the failed appointments procedure and again appoint some to Government Statistician who lacks a good grasp of the technical issues which dominate the operations of Statistics New Zealand.

Postscript: In virtually all the discussions with professionals I have had on these matters, they have also complained about the difficulties they have with SNZ web-portal to its data bases. It is as if the generic managers dont want to interact with outside experts either.