How Ocado broke down its monoliths to create a scalable, saleable Smart Platform
Ocado wrote most of its own software and intends to sell it to other retailers - but the original monolithic architecture had to be torn down first
With 645,000 customers, 49,000 products and three distribution centres (for now - it is building a fourth), Ocado is the world's largest online grocer. However, even this disruptive firm has legacy technology.
"Even though we're a grocery retailer on the front of it, we're not your typical retail enterprise," said software engineering team lead Matthew Cornford, at Computing's DevOps Summit. "Behind the scenes there's a vast array of technology powering our retail business…
"We've learnt that just picking software off the shelf isn't sufficient for our real-world problems and our complex needs. To address these problems, we've had to build up competencies in-house."
Infrastructure team lead Luis Periquito said, "We had a very central typical enterprise solution, where if we want to grow things we just add more: we have central databases and if they run out of grunt, we add more CPUs and more RAM."
He continued, "As we have these massive monoliths, we force everything running on those machines to be exactly the same. Every single development team, every single application will have to work the same way [and] on the same environment."
This monolithic environment brought many challenges to Ocado, which Cornford summarised as competition; painful deployment; and narrow focus.
Competition: "In a warehouse the size of ours, there are multiple projects going on, and these were assigned project managers, whose role it was to ensure that the technology part of the project was delivered on-time… The developers weren't equipped to know which projects were the most important - which was going to have the most valuable impact. What often happened was that the IT project manager who could shout the loudest would get their work done first."
Painful deployment: "If I wanted to deliver a monolithic application, that means downtime… and reduces the throughput and efficiency of our warehouse… Luckily there was a natural downtime window at the end of the day, and all of our deployments had to be done in this time.
"We also had quite a lengthy approval process to go through. To get a deployment done we had to sign a form; chase a manager to sign it off; find someone in Operations to action our deployment; and this was all at the end of the day when everyone either wants to go home, or has already gone home…
"Add to that the fact that our code wouldn't actually start running until six or seven in the evening, so if something did go wrong that inevitably meant the developer on call would get called in the early evening - if they were lucky - or more likely in the middle of the night to sort out a problem."
Narrow focus: "Our production environments were imposed on us by the infrastructure team. This led to teams that were quite narrow in focus and specialised… The problem with this is when something goes wrong, you need everyone to sort it out because no-one has the big picture. I remember a few support calls with tens of people where everyone was just trying to figure out if the fault lay in their area. That tendency can lead to stress, conflict and mistrust between teams."
How Ocado broke down its monoliths to create a scalable, saleable Smart Platform
Ocado wrote most of its own software and intends to sell it to other retailers - but the original monolithic architecture had to be torn down first
Breaking down the monoliths
"Our current challenge is to create a warehouse system that can power the world's largest retailers - so we couldn't use monoliths any more," said Periquito. "It's very hard to create monoliths; it's very hard to replicate...and they're very expensive to start with…Instead, we had to use models: buildable parts that we could grow as and when we need. If we need to pack more items, we add more bots.
"[In the new system] we have all these moving parts, so scale out naturally became very important. If we want to add more resources, we add more resources - we don't try to grow the resources we have. That means we can expend capital if and when we need to; hopefully we don't have such a huge capital expense at the building."
Cornford added, "The monoliths are mostly gone, and we've moved to a more service-based architecture; something that's horizontally scalable and expects underlying infrastructure to fail. This is great because it means that our applications are more reliable in the face of actual hardware failures, and in the face of planned failure, like deployments and upgrades… It means we can do these things more often."
Ocado took its first steps to becoming a technology supplier late last year, and it intends to continue this expansion - aiming at what Periquito called "the infrastructure Holy Grail": starting from a completely powered-off server room and warehouse, but able to automatically turn everything on, deploying infrastructure and applications and configuring the network without anyone touching it - anywhere in the world.
"This," he admits, "is a really big challenge."