Big Data Summit 2014: Traditional database and BI tools 'can't handle' time and uncertainty, says Met Office CIO
Traditional analytic technologies struggle with the volume of data and multi-dimensional nature of Met Office's work
Traditional database and business intelligence tools and technologies are inadequate for conducting big data analysis at the Meteorological Office because they cannot easily handle the "dimensions" of time and uncertainty.
That is the warning of Charles Ewen, CIO at the Met Office.
Speaking at the Big Data Summit 2014, Ewan described how the Met Office takes information feeds of weather and climate from a variety of sources in order to generate its models.
"In order to do weather forecasting, the first thing that needs to happen is to gain an understanding of the current state of the atmosphere, and we do that through an observation programme. That programme is absolutely massive. It's a global endeavour and works through standards that the World Meterological Organisaiton has put down for years," said Ewan.
Those "observations" cross geographic and political boundaries and can come from aeroplanes, ship buoys and other recording stations. The Met Office's supercomputers also take in satellite data.
"Satellites are inferred observations. They are from the 'wrong side' of the atmosphere as well, so there's all kinds of limitations on what can be done with satellite information. But the amount of information being stream from them is unbelievable," said Ewan.
"There's about 100 million 'observation messages' coming to Exeter [where the Met Office is based] every day and each message will consist of quite a lot of data in its own right."
These are fed into the Met Offices computers in order to generate a gridded, multi-varient, dense, computer data set. The Met Office breaks the world down into a grid overlaid over the surface of the Earth, with up to 70 layers into the atmosphere to record the weather. For climate modelling, this data also needs to encompass ocean temperatures too, in a similar manner.
This multi-dimensional model then adds a fourth and fifth dimension to it - time, of course, and "uncertainty". Said Ewan: "The shape of the data is four dimensional - the fourth dimension is time... Actually, it's really five dimensions with the fifth dimension that of uncertainty."
He continued: "You can't, with any degree of certainty, say what a chaotic system will look like, even in the short term, never mind the long terms... So in order to cope with that, we run the model not just once, but a number of times under slightly different expressions of 'known uncertainties'."
Because of both the complexity and volume of the data used at the Met Office, traditional database and business intelligence tools are inadequate.
"A lot of the current technologies in the kinds of data structures used to manipulate geo-spatial data are very 'flat-Earth' technologies. You will have real problems expressing this kind of information in polygons," said Ewan. Representing the data in temporal form - over time - is even more challenging using conventional technology.
However, he added, organisations ought to have an open-minded "mixed economy approach" to big data that embraces mainframe computers, if necessary, as well as traditional relational database technology, where appropriate.