Microsoft and Nvidia to bring end-to-end protected GPUs to Azure
Microsoft is working with Nvidia to bring confidential computing to Nvidia GPUs in the cloud; an addition to the protected CPUs already available.
In today's world, the cloud serves a variety of data-related requirements for businesses of all sizes, including storage and AI solutions.
Organisations collect data at an unprecedented scale, using it to train complicated models and develop insights using the cloud.
However, the increasing demand for data has raised concerns about security and privacy, especially in regulated industries like government, healthcare and banking.
Protecting the privacy and security of sensitive data makes it essential to use so-called confidential computing practices.
Confidential computing focuses on encrypting and securing data in-use. The term refers to hardware- and software-based solutions that enable user data inside a computer's memory to be isolated while it is processed. The goal is to prevent the data from being exposed to the operating system, applications or different users of a cloud server.
Various organisations, including service providers, application developers and platform providers, are working to standardise a concept known as TEE, or trusted execution environments. TEEs are 'private' areas of a computer's memory where only some specific programmes can read/write data.
Data within TEEs remains encrypted not just during transit or at rest, but also while it is in use. The approach has industry support, with both Intel and AMD already starting to implement TEEs in their CPUs.
Microsoft provides confidential computing solutions on Azure, using the same concept. It now wants to expand its approach to include GPUs, ensuring that data can be safely offloaded to more powerful hardware for processing needs.
Extending the trust boundary is not simple, according to Microsoft. Implementing confidential GPUs on Azure would require protecting them from different types of attacks while also ensuring that Azure host machines have adequate control for administrative activities.
The implementation also needs to avoid any negative impact on thermals or hardware performance.
Microsoft's believes it can realise that vision by adding the following capabilities to the GPU:
- A new mode that isolates all sensitive GPU state, including GPU memory, from the host
- A hardware root-of-trust on the GPU chip that can provide verifiable attestations capturing all security sensitive state of the GPU, including firmware and microcode
- Hardware support for transparently encrypting all GPU-GPU communications over NVLink
- Support for securely attaching GPUs to a CPU TEE in the guest operating system and hypervisor, even if the contents of the CPU TEE are encrypted
- GPU driver extensions that check GPU attestations, establish a secure communication channel with the GPU, and transparently encrypt all communications between the CPU and GPU
A new feature called Ampere Protected Memory (APM), in Nvidia's A100 Tensor Core GPUs, is a step towards this goal.
'With confidential computing support in Ampere A100 GPUs combined with hardware-protected VMs, enterprises will be able to use sensitive datasets to train and deploy more accurate models without compromising security or performance,' says Microsoft.
Microsoft is inviting users to sign up for the private preview of Azure confidential GPU virtual machines.
These VMs include up to four A100 GPUs with 80 GB of HBM and APM technologies, which customers can use to more securely run AI workloads on Azure.
The Windows-maker adds, 'With confidential GPUs, you can set up a secure environment in the Azure cloud and run your machine learning workloads utilizing your favorite machine learning frameworks, and remotely verify that your VM boots with trusted code, the NVIDIA device driver for confidential GPUs, and that your data remains encrypted as it is transferred to and from the GPUs.'