Technical note on the treatment of missing data in the 11-station series

In climate science, there are a number of accepted methods to account for missing data in temperature series. This note explains in technical terms what we did for the 11-station series.

Data may be missing for a number of reasons, such as instrument failure, lack of recording by an individual, or a station having closed.

For ease of accessibility, we have posted an excel spreadsheet of the series.

11-Station annual temperature series data (XLS 30 KB)

All raw data for this series are freely available from the NIWA climate database, CliFlo.

CliFlo

Missing months during the year

There are missing months in some of the raw data records. Annual values could be calculated for only those years with no missing months (this was the approach used to produce the graphs linked to below), but this would disregard a significant amount of information.

'Eleven-station' series temperature data

Alternatively, an annual average can be calculated even if some months are missing. This should not be done by simply averaging the temperatures from the non-missing months – the annual value would then be biased low if a summer month was missing, or biased high if a winter month was missing. The correct procedure is:

  • calculate the monthly anomalies by subtracting the climatology for that month (in this case the 1961-1990 average), then
  • average the monthly anomalies to obtain the annual anomaly, ignoring missing months.

There is a trade-off between getting the most out of the data we have, and increasing the uncertainty in the estimated annual value.

For the spreadsheet, we have decided to allow just one missing month in any year. This adds about 50 station-years to the 11-station series that would otherwise be missing. If two or more months are missing in any year, then the annual average is missing also.

Missing annual values or missing sites

The resulting annual anomaly series for individual sites still has missing values in some years. This can include the situation where a site has not yet started (eg, Invercargill, pre-1949), or has closed (eg, Molesworth, 1994).

The 'eleven'-station average is simply the average of the individual station anomalies over those stations where the annual anomaly is available.

This is a very important point, often misunderstood. Averaging the anomalies (deviations from climatology at each site respectively), and not the actual temperatures, means that no bias is introduced if the number of cold versus warm sites fluctuates during the period of analysis. For example, over the period 1931-1937, Palmerston North is colder than Queenstown in ‘anomaly’ terms, even though Queenstown is colder by about 3 °C in absolute terms. Using the absolute numbers will bias the results; using the anomalies will not.

In the early years of the series, there are five sites, Tauranga, Hamilton, Ruapehu, Palmerston North, and Queenstown (although Ruapehu is often missing). Ruapehu and Queenstown are the coldest sites in absolute terms. In later years, there are more ‘warm’ sites in absolute terms. This might lead someone to conclude that the series has a fake warming trend. But that conclusion would be based on a fundamental misunderstanding. What we are analysing is whether the temperature is getting colder or warmer at each site where there is data: the trend will be shown in the anomalies, not the actual temperatures, so use the anomalies.