Starschema fights COVID-19 with single source of truth for infection data
The free data set collates epidemiological information from COVID-19 cases that worldwide public health authorities have reported, ready for analysis
One of the most concerning factors around COVID-19, the disease caused by the novel coronavirus, is how little information we have about it. Although our knowledge is increasing rapidly, there is still plenty of research to be done to combat the disease effectively. To that end data services company Starschema has launched a free, public data set on the Snowflake Data Exchange, which brings together information on incidence and mortality of COVID-19 cases worldwide.
"Everyone is dealing with the effects of COVID-19 in one way or another. Our goal is to deliver the highest quality data sound enough to stake lives on, with the utmost transparency," said Starschema CTO Tamas Foldi.
Starschema believes its free data repository will help organisations make contingency plans, as well as inform data-driven decisions as they plan their response to the current emergency. Public and private sector data consumers will have access to the information in an analytics-ready format, so they can quickly build new models and applications. According to Starschema, the data set eliminates the need, and challenges associated with cleaning and preparing the data.
The Snowflake Data Exchange is a secure, fully-governed platform for sharing and exchanging data. Organisations can connect to the Data Exchange from within their Snowflake account for seamless integration of the COVID-19 incidence data set and fast query processing.
Starschema used the Data Exchange to amass the epidemiological data from multiple sources into a single source of truth, while also enabling the firm to enrich that data with relevant information like population densities and geolocation.
Public health authorities can use the data set to access phylogenetic studies, to reference and identify whether particular strains of SARS-CoV-2, the virus that causes the COVID-19 disease, carry a higher risk. Governments will also be able to make decisions for civil contingency planning based on data from neighbouring states. In the private sector, enterprises can use the data to support business contingency operations and analyse supply chains for vulnerabilities.
"Snowflake is helping us deliver on [our] goal so public health professionals, contingency planners and enterprises can best respond to this global epidemic," Foldi said.
Matt Glickman, Snowflake's Head of Data Exchange, added, "As the COVID-19 pandemic progresses, we can expect data to play an increasingly important role in both public and private operations. It's essential organisations have access to accurate, near real-time data in this rapidly evolving environment, and we're humbled that the platform architecture is positioned to help democratise access to Starschema's data in this time of need."
Starschema plans to further enrich their COVID-19 incidence data set on the Snowflake Data Exchange with data like local emergency measures, demographic information for affected geographies, and additional reporting levels from regions, states, and country resources.
COVID-19 may be the most-visualised disease in history, with hundreds of organisations working together to share information and present it in usable formats. John Hopkins University launched a massive data collation and analysis tool early on in the crisis. Tableau is working with the same data in readying it for analysis, and Esri is mapping the data. Even Reddit is getting involved in tracking the virus.
Organisations can request access to the free data set here.