Meeting the peak: Optimising your databases to handle increased demand during lockdown
We are currently experiencing the greatest stress test in the history of the internet, here's how to make sure your systems pass
The impact of coronavirus has been felt around the world. While some industries have had to close due to social distancing and lack of demand, companies in other sectors are experiencing overwhelming increases in systems usage and traffic. This includes more obvious industries such as video conferencing and collaboration tools providers, but also banks, pharmacies, online retailers and logistics companies. A significant number of businesses are seeing unprecedented growth, driven by people using more internet services and buying more online.
A recent survey by Percona revealed that 47 per cent of companies have seen an increase in traffic. One of our customers reported that their traffic levels had increased by 700 per cent during lockdown! Other companies selling products and services have had to take the drastic measure of queuing visitors online to manage demand.
We are currently experiencing the greatest stress test in the history of the Internet.
These changes place pressure on the database instances that these companies use. Spikes in traffic inevitably lead to a slowing down of systems. Many customers will regard this as an outage, especially in today's online world of instant gratification. We are currently experiencing the greatest stress test in the history of the Internet.
Why database performance matters
During any unexpected event that generates a huge surge in demand, database performance will make or break an online business. While performance and scalability have long been requirements for successful websites and applications, during a crisis like Covid-19, it is vital for companies to be able to maintain outstanding application services. This means we all need to ensure continuous database availability and uptime.
The current surge in traffic can effectively be split in two. For online retailers, this huge spike in traffic will be generating greater profits. These thriving companies are more likely to have the budget to invest in immediate and long-term scalability strategies. On the other side, for a range of organisations in financial services such as banks and mortgage companies, as well as consumer retailers like airlines, this spike in traffic will involve huge volumes of non-profitable traffic such as requests to delay mortgages, cancel flights, et cetera. These businesses will need to maintain their database performance in a more cost-effective way.
Optimising performance - the reactive route
The biggest problem here is that we are experiencing an event that could not be planned for, like a Black Friday or Cyber Monday. As a result, it's tempting to implement measures quickly to deal with these huge increases.
However, you don't have the luxury of re-architecting your application to meet a ten-fold spike in traffic, for example, because that would take months to carry out. Instead, you can look at your databases to achieve instant results that can help deal with all this traffic.
There are two things behind this - firstly, many DevOps teams don't tend to focus on database performance tuning and optimisation when it's not necessary. This means that there are usually good opportunities for improvement that already exist, based on judicious tweaking and an understanding of how databases work. Secondly, there are some alternative approaches that can help, even if they are not necessarily the most cost-effective solutions for the longer term.
Scale by credit card should only be a quick fix
So, what can you realistically do right now? Here are four suggestions:
1. Scale up your cloud service
Scaling the size of your instance is a low complexity and high impact solution if you are running your databases in the cloud. Moving to a larger instance size provides an easy (but expensive) approach that is ideal as a short-term response. However, this "scale by credit card" approach should only be a quick fix while you look for longer term methods of performance optimisation.
For those companies running private clouds or virtualised environments, this approach works if there are resources available that can be dedicated to the database instances. While there is not the same flexibility that public cloud providers offer, virtualisation can provide some ability to move workloads and free up resources, as long as your licenses allow this. For companies running open source databases, that point won't be an issue.
2. Disable nonessentials
When you need everything working for you in a crisis, look at what you can turn off to free up resources. Assess your application, identify all your critical features and disable any expensive, non-essential features or side workloads. This enables you to optimise your database performance and deal more effectively with surges in traffic.
3. Freeze your code
As soon as you anticipate extra traffic, one quick way to simplify database management is to introduce a code freeze. This means that everyone is working on keeping existing services up and running rather than developing new functionality during a crisis. In a recent survey, we asked 100 customers what their number one database problem was. Untested code was their main challenge. A code freeze therefore can help you avoid adding additional problems during tough times.
4. Prioritise
An indirect way to improve system performance is to prioritise your services. For example, many retailers have prioritised the delivery of more essential items such as food and medicine, while non-essential items can still be delivered, but in longer timeframes. If you have not done this yet, consider splitting up your application or products so that you can prioritise what is important. Communicating this to customers can help too, as it sets expectations for service levels over time, as well as helping your database performance by reducing traffic volumes.
5. Get some help
Bringing in consulting expertise can help when you have databases under stress. Getting help quickly can provide an effective way to make better use of existing resources and improve performance, whether you are running in the cloud or not.
Optimising performance - the proactive route
Alongside the short-term and reactive steps, you should think about how to improve your approach in the longer term. By taking good consulting advice, you can look at how to streamline your services, ensuring future shocks to the economy don't have as much impact.
1. Keep logs
All developers should have logs from their applications. Alongside providing information for specific debugging and incident investigation, logs can provide essential data, allowing you to carry out a complete diagnosis once the dust has settled. At the very least, this can help prevent a similar impact from any crisis that takes place in the future.
2. Preventative maintenance
Planning ahead can have a massive impact, making it easier to avoid problems when you are under stress. To avoid future surges in traffic becoming a problem, you must have the right tools in place to identify, diagnose and fix issues. Without these tools, no one will realise what is happening until after the event. Using these tools in advance to carry out maintenance regularly can help avoid some of the issues in the first place.
Use slow periods to pre-tune your database instances
3. Optimise in advance
Alongside maintenance, optimisation is something that you can carry out in advance. Using slow periods to pre-tune your database instances can help you ensure that everything is always up to date. This will enable you to manage spikes in traffic the next time they occur. This is also the ideal time to test systems under load and rigorously test code.
4. Look at your failover strategy in more detail
Failover can make things worse by causing a service interruption and a more problematic slowdown. If you can avoid this then great; if not, then improving how you manage failover should be a priority.
One approach is to look at how to automate self-healing in your database clusters. This is particularly useful if you are running your application in containers using an orchestration tool like Kubernetes, as your database instances can be automated in the same way using Kubernetes operators. If and when the application goes down, Kubernetes will automatically restart container images to rectify the problem and the database service can be restarted at the same time. Similarly, new container images launched to meet higher application demand can be automatically supported by new database instances.
Stressful times
Whatever your line of work, the current situation is bound to create additional stress, either for you and your team, or on your applications. Taking the time to think about your database can lead to improved performance levels and ensures that, whatever happens in the future, your data is not a cause for concern.
Matt Yonkovit is chief experience officer at Percona.