Sunday, November 1, 2015

Coverage of GHCN V4 compared

In an earlier post, I took a first look at GHCN V4 beta. Details, sources etc are there. I've been looking with a practical eye, because at some stage I will have to adapt TempLS to use it. As remarked there, by me and others, GHCN V4 has a lot of extra stations, but noy proportionately better coverage. It's merit may be with homogenisation, rather than a better global average.

In this post I'll look in more detail at that issue of coverage. In the back of my mind, I am thinking about how to use a reduced set. This is not just to save computing time; big disparities in station density actually create accuracy problems.

I will compare using the cubed sphere, that I described recently. It gives a grid of almost equal area cells. I since noticed that it has recently been adopted by GFDL, and is described here.

I'll show a WebGL plot (16x16 faces) with cells colored by the number of datapoints within, for the data of August 2015. You can switch between the data I currently use and the GHCN V4 data with full ERSST. It shows cells with zero (and sparse) data, and also cells with a great deal. I'll also show histograms for comparison. I'll then briefly discuss strategies for rationalization.

Here is the WebGL plot. As usual, the Earth is a trackball that you can rotate. The "Switch" Button switches between V3 and V4 data. Cells are colored by number of data points (dots) within. The key shows the number of data per cell. I'll discuss below the plot.

The V3 picture is influenced by my thinning of the ERSST data from 2°x2° to 4°x4°, basically to match the land density. So ocean cells have typically 1 or 2 datapoints (but not 0). I think for SST this is quite satisfactory, since spatial variability is modest. With V4 I haven't thinned, so ocean cells have a lot of data. It's better for comparison to focus on the land.

Because it is SH winter, there are large areas of sea ice. ERSST assigns these a value of -1.8°C (freezing point of sea water), but I remove them. This creates a lot of empty cells. Six months earlier, the Arctic would have appeared thus. The main land area to compare is Africa. V4 does have fewer empty cells, but still some.

Here are histogram plots of numbers in cells. The right blocks embrace a range, of which the one shown is the minimum. With V4, there is a big block of cells with about 6-9 stations. This happens because of the regular SST grid, which tends to give 9 points per cell, but with frequent variation. With V3, with thinned SST, the ocean majority are in the 1-3 range.

Version 4 GHCN with ERSSTV3 GHCN with reduced ERSST
I think a reasonable number of data per cell to aim for is four. That gives about half the standard error of mean, and about 6000 data in total. I would probably create a reduced set to be used for all months, so to cover back to 1900, there would be more than four needed in total. But there is ample scope for being choosy in many places.


Post a Comment