Cracking Kubernetes: Container Considerations at bet365
Alan Reed, Head of Sports Development, Hillside Technology, bet365's technology business, explains how the organisation employs Kubernetes
On average, it takes a container ship 40-50 days to get from the UK to Australia. While sea is far from the fastest mode of travel, we accept it's the most efficient means of transporting large volumes of goods, halfway across the world.
At bet365, we don't have that luxury. Time is of the essence. So, when it was suggested that Kubernetes and containerisation could be an effective approach, many of us in Sports Development were sceptical.
The way the technology is sold, it feels far closer to a single package release. In that you reduce risk by transporting as much as possible. However, theory and practicality don't always go hand in hand. We understood the principles but couldn't see how they could work in the real world of bet365.
We execute hundreds of releases a week, dozens a day and are always looking for efficiencies. For us, it's about speed and momentum. Of equal importance to us, are the triumvirate of quality of the code, governance and the associated release procedures that support them.
It's why we adopt a mantra of small is beautiful. To optimise the speed of change and maintain the integrity of the code, it's crucial that each release is as lightweight and low-touch as possible. It enables us to be more surgical in what we release, to where and when.
We then saw the proof of concept that our Infrastructure team had engineered with the help of R&D and were blown away by the speed at which the containers could move. One analogy was it was like putting a jet engine on a freighter. But in all honesty, it was more like using a transporter on the Starship Enterprise. Deployment was that fast.
With the speed argument moot, it was now up to Sports Development to develop a container strategy. What follows is a look at the key considerations we examined on our journey.
Size
Following the demo, our first thought was to look at how we could size the container. In a world of rapid change, closely regulated markets and high degrees of governance, our challenge is in how we fragment our source code, rather than how we broadcast our data.
Because of the size of our architectural footprint, in order to be able to tailor our product to our individual customer's needs, monolithic deployments of code to all destinations are not the way forward. What we hadn't appreciated was how configurable the container could be. We knew you could put an operating system in it but for our immediate needs, that would introduce unwanted complexity. However, there is also a minimum set of requirements the container needs to run the artefact inside it.
We had to consider our source code and how it would define each deployable component. As a result, we found that our application containers could be far smaller than we originally expected. Also, as Containers are lightweight, they take up fewer resources, enabling you to save on hardware and data centre costs.
The size of our first container with everything required to run our application inside it was as small as 8MB. That's a significant reduction when compared to the traditional approach of deploying the application onto a Virtual Machine, which can be measured in terms of Gigabytes in some cases.
Durability
When making delta releases (releases where only the change is deployed, not the entire application), you focus more on making the change than necessarily on the management and consistency of the code base or the impact the change will have on any dependent environments.
As with all large interactive websites, our product offers diverse functionality across our customer base. Consequently, managing the fragmented codebase becomes a lot more complex and you can lose durability.
Before Kubernetes, durability was something we'd never considered before. We hadn't appreciated that a container can live almost anywhere and in a range of configurations. We found that Kubernetes would allow us to have complex topologies, while solving the issue of diversification.
This is because the point of facilitation is much closer to the source.
Because we can make decisions far earlier in the lifecycle of the code, we can manage the differences earlier, while maintaining oversight of entire codebase and the management efficiencies that come with it. While we had originally thought it was an infrastructure issue, we realised that we could solve the challenge of complexity through orchestration.
[Turn to next page]
Cracking Kubernetes: Container Considerations at bet365
Alan Reed, Head of Sports Development, Hillside Technology, bet365's technology business, explains how the organisation employs Kubernetes
Management
Like many websites, the customer engagement is peaky. There are moments of intense activity and there are periods when volumes are a lot flatter. It's therefore critical that we can manage our resources to reflect the different activity profiles. During major sporting events, where we need capacity, we can spin additional containers up very quickly that deliver the extra resources needed.
However, we can also save resources and only deploy them when needed. If we are looking at releasing code outside of peak hours, then we can look at a far smaller number of containers. As our products are accessed globally it is now possible to scale without complex configuration changes to the existing system. We can also move the containers up and down and have a different version of the same container in different parts of the world without risking cross contamination.
With Kubernetes our products are protected against failure by checking the health of our containers within the cluster. This gives us acceptable fault tolerance with self-healing and auto replacement. If a container crashes Kubernetes can seamlessly recover and start up new instances.
Additionally, if we want to separate a set of containers from each other we can achieve this within the same cluster using Kubernetes built-in system of namespaces. No longer do we need complex network configurations to isolate one system from another.
Security
Security is a key concern and it's critical that our development environments are as secure as possible.
Traditionally, this involves a robust system of monitoring and patching. However, because we've architected our base containers to be barebones runtime environments, we've made the security simpler to manage.
Certainly, in the future we may need to include an operating system and at that time the base containers will need greater scrutiny. We've already included a code of practices within the workflow that will enable us to do that.
Hot Releases
The ability to deploy code to our production estate without activation in parallel to our testing environment was an absolute must have. There are times when we have to make changes very quickly and the ability to make those changes while the system is under peak load is vital. At bet365, we call these hot releases.
We'd tried other hot release approaches with some success; though found that the overhead of working with the existing deployment process meant that we couldn't achieve the ideal speed we wanted. The challenge is that it's possible for code to behave differently in the test environment due to the nature of the non-deployed environment configuration, to how it would in live and so the gains weren't as rich as we'd hoped.
Again, Kubernetes surprised us. We found that you could deploy the same container to the testing environment and the production cluster in parallel. You then run the test, while the production container sits idle. If the test works you make the production container live. If things don't go as planned, then rollback is quick.
Traffic Strategy
There's a lot more to container design that meets the eye. While the application and the platform are not so tightly bound as in traditional systems, they are not mutually exclusive either.
A holistic approach is needed that requires Infrastructure and development to work closely together on facilitation. You have to understand your pipeline as much as you need to understand your product.
We started building containers before we had an environment that was container ready. We believe this was the right way around because you need to build the container first so that you understand how it runs and works and what's needed in terms of communication to and from the container.
Working with infrastructure allowed us to establish how to best route traffic to the cluster and more importantly to the appropriate set of containers. In addition to ingress traffic we also needed to understand how to best route egress traffic to other systems on our internal network outside of the Kubernetes cluster.
Next Steps
The creation of our container strategy has been a key milestone in our Kubernetes journey, which Infrastructure set in motion. We are now looking to improve on this foundation as we explore the implications of moving more of our traditional systems to the platform.