Monitoring high availability clusters: Why Ping Identity chose Severalnines over SolarWinds
We needed a solution that monitors Galera MySQL clusters and MongoDB, chief reliability engineer tells Computing
"A lot of companies say they're distributed and fault tolerant, but in fact they reside in a single data centre or a single AWS region," said Michael Ward, chief site reliability engineer at Ping Identity, a provider of cloud authentication solutions to some of of the world's largest organisations.
"But PingOne [the firm's cloud-based single sign-on application] is currently deployed to four - soon to be five - different data centres throughout the world."
As IT systems get more distributed and characterised by multiple vendors and environments, maintaining visibility over the whole becomes ever more important. Add in requirements for zero downtime, strict data governance and minimal latency and you start to see why managing applications across multiple data centres in different time zones is not for the faint-hearted.
"If you don't design your applications and data stores that way from the ground up, it's very hard," Ward said.
Ward told Computing that thanks to the distributed architecture, even when the Boston data centre was flooded for several days during Superstorm Sandy in 2012 the service remained at full capacity.
The resilience is down to ensuring all systems and sub-systems - from a data centre down to a single server - are replicated synchronously and automatically. To ensure there is no drop off in performance, it is also vital that monitoring and management systems are maintained in the event of a disaster.
"If we lose a data centre PingOne is inherently built to failover to an ultimate data centre. We need to be able to have our monitoring still up. If we lost our primary data centre we still need to be able to view all of our clusters," Ward explained.
The all-important configuration data is read and updated by various subsystems deployed in geographically dispersed data centres. This data needs to be strongly consistent across all these locations. To meet these requirements Ping Identity chose Galera Cluster for MySQL for its ability to synchronously replicate data across remote data centres.
However, MySQL is not so suitable for all distributed computing tasks or for when a more flexible data model is required, and for other applications Ping Identity chose MongoDB, the open source NoSQL database. So, PingOne is a complex application, even before you start considering its distributed architecture.
"It's about 20 different micro-services all of which have to interact with each other as well as with the databases. It's not an easy thing to accomplish. As soon as you add another data centre - or continent - the level of effort and diligence and testing that must be put in place to ensure availability becomes that much higher," said Ward.
PingOne is distributed across public cloud and private cloud data centres. To ensure that visibility is maintained across the piece Ping Identity deploys ClusterControl from Swedish company Severalnines. ClusterControl allows operational staff to manage the most popular open source databases (currently MySQL, PostgreSQL and MongoDB) via a single dashboard.
"We have MySQL Galera clusters in three data centres. Being able to see what's going on in all those clusters can help us determine all sorts of issues and narrow them down to a single server. It allows us to get the 10,000 foot view as well as the one foot view," Ward said.
"Consistency is important. While Galera allows us to see some pieces of information, ClusterControl allows us to see that across the whole deployment," he explained, reeling off a long list of performance metrics that are made visible via the dashboard, allowing his team to optimise performance and locate bottlenecks quickly.
Having a ClusterControl server in every data centre helps with maintaining 100 per cent uptime with no loss of performance, Ward said.
"It's basically a hot standby, which means you can have one in each data centre to account for failover issues and still be able to monitor the remainder of the cluster in a network segmentation type of scenario."
Asked why his company originally chose Severalnines three years ago, Ward said price was the main reason. ClusterControl was the best value for monitoring dispersed Galera MySQL clusters. Severalnines also had close ties with the Codership team that developed Galera. At the time Ward was also using the Zabbix monitoring and other SaaS-based tools, and the MONyog MySQL monitoring tool, which Ward said "doesn't handle Galera too well - we wanted something specific to our data stores".
Other options considered included SolarWinds.
"SolarWinds is a good robust tool but limited for data centres and it doesn't offer the value of Severalnines. It's a great solution but requires a lot management time. Severalnines is very simple to set up and get started with and do what you really need," said Ward.
However, one area in which it is lacking is support for Apache Cassandra, which at Ping Identity has replaced MongoDB as NoSQL data store of choice.
"There are a lot of NoSQL databases out there," said Vinay Joosery, Severalnines CEO, in response.
"We have not supported Cassandra so far, but we are taking steps to do so. It is a lot of work when you start monitoring a new data store, but Cassandra is hot and addresses a lot of interesting use cases so hopefully we can do something with it soon."