DevSecOps in space: the challenges of updating satellites on-orbit
'It’s tough to hit a toaster 100 miles above your head that’s moving at 20,000 miles per hour,' says Hypergiant's Bren Briggs
Bren Briggs, head of DevOps and cyber security at Hypergiant, explains how Raspberry Pis, Kubernetes, and a lot of technical know how are changing the way satellites can be upgraded and repurposed on-orbit.
Eight years ago, the US Air Force barely had any dedicated software developers. Instead, the ethos was still to outsource such activities and focus on managing suppliers. That has now changed completely, with the Air Force - and latterly Space Force - embracing Agile development and operating numerous ‘software factories' in which suppliers, civilians and uniformed personnel collaborate on cutting-edge tech projects.
One such is PlatformOne, an enterprise DevSecOps services organisation for the US Department of Defense, which creates reusable infrastructure as code components such as IronBank, a registry of hardened container images, and Big Bang, a set of Kubernetes configurations and applications to standardise deployments. PlatformOne itself has spawned numerous projects, including a number in the field of satellites.
Like the Air Force itself, satellite technology has tended to be a long way from the cutting edge of IT, and for historically good reason. Space is an exceptionally harsh operating environment characterised by extremes of temperature, damaging radiation and limited energy supplies. Transferring equipment into space can be eye-wateringly expensive and terrestrial connections and bandwidth are often poor. Because of these limitations, satellites tend to use special space-rated hardware and bespoke software that's limited in functionality but built to last with infrequent upgrades - a world away from the continuous updates and commodity hardware favoured by most terrestrial operations these days.
On-board satellite IT is typically limited in functionality, cumbersome to update and very hard to adapt for new use cases
The upshot is that on-board satellite IT is typically limited in functionality, cumbersome to update and very hard to adapt for new use cases, meaning the simplest option is often simply to abandon old satellites and send up a new one, said head of DevOps and cyber security at AI/ML services company Hypergiant and ex-guardsman Bren Briggs. For the past couple of years, Briggs and his colleagues in the Air Force and at SUSE Rancher Government Services have been working on a number of projects to bring DevOps practices, containers and cheap commodity hardware to space, so that satellites can be more easily updated, repurposed quickly from the ground, and even support high-level edge compute functionality such as AI-based computer vision on-orbit.
Ultimately, the projects should result in a near-earth orbit satellite equipped with very cheap hardware that allows software updates to be beamed in when required, with advanced compute capabilities onboard.
The Rancher K3S Kubernetes distribution was chosen as a platform for the satellite connectivity project, dubbed SatOne, as it is small and stable, can be installed on Raspberry Pi 4s and Nvidia Jetson Nanos - the commodity hardware of choice - supports ARM architecture, will work in an airgapped environment with limited storage, and also supports the GPUs that will run some AI models in space. Plus, of course, it's open source.
The first challenge when updating a satellite is maintaining contact. Briggs explained.
"It's tough to hit a toaster 100 miles above your head that's moving at 20,000 miles per hour," he said. "There's also the issue that the window from a single spot can be five or six minutes from acquisition of signal to loss of signal."
Connectivity is maintained by having a series of mission control systems around the world, a service that even Amazon and Azure are starting to offer, illustrating the rapid pace at which space technology is changing.
"We're commoditising space, even at the cloud level where we have ground stations that you can just rent access to, which is wild," Briggs said.
Nevertheless, there remain a number of hurdles to performing updates reliably and smoothly. Even with multiple ground stations connectivity can be unreliable, meaning that the system must be able to handle a high level of asynchrony. In part this is achieved by having the satellite pull small updates when in range, in classic DevOps fashion.
"Essentially it's DevSecOps but in space," Briggs said.
"We're trying to enable GitOps where instead of waiting for an active connection we merge the infrastructure change to the main branch and next time the satellite has availability to one of our ground stations, it will check that, it'll pull the Git commit in and reconcile that difference."
Updates can also be done by rebuilding against the base image to pull in any patches or updates on a periodic schedule and cache those images in the mission control stations to be pulled by the satellite as it passes by.
Obviously, given the bandwidth constraints and short update windows, it's vital to minimise the size of the updates.
Rust is fantastically small and we fell in love with it during this experiment
"We have to optimise our builds to make sure that we're not rebuilding the entire container every single time," Briggs explained, adding that the team has adopted the Rust language to keep the size down.
"Rust is fantastically small and we fell in love with it during this experiment for that reason. It is a little more difficult to write, but we're talking in the order of five to 10 megabytes for services sometimes smaller for decently large applications."
Tests are ongoing, but a satellite equipped with the new tech should be ready for launch in the next couple of years. In the meantime, Briggs has continued to work on another related Platform One project called EdgeOne, which deals with the fast-growing field of edge computing, creating systems that can be, to all intents and purposes, self-sufficient. Here, connectivity issues can even more difficult than with satellites, because you cannot assume eventual connectivity. Setting aside that assumption requires a whole new thought process.
Just having to think about operating in a remote disconnected state forces you to change a lot of patterns
"Just having to think about operating in a remote disconnected state forces you to change a lot of patterns and failure modes," Briggs said.
"For instance, what happens if I arrive on location and I set up in a disconnected state but I never become connected. What does the failure mode look like here, how do we respond? And what happens if we eventually do get communication but it's a month down the road? So it's making sure that you can not only start disconnected but you continue to operate and execute your mission in a disconnected state, that's critically important for us."
While still experimental, Briggs said customers are already interested in this work, many in areas that he did not anticipate. Kubernetes use cases are multiplying rapidly at the edge, he added.
"They say ‘We want to replace this archaic system, and we want to use Raspberry Pis instead of bespoke hardware, and we want to replace this outdated protocol so we can push more than something that measured in baud.
"It's really cool that that other folks are noticing what Kubernetes can do on edge. For us it's finding all of these applications and use cases, continuing to develop and integrate, and then make the product a lot smoother."