AMD unlocks the power of the GPU with Kaveri chips

AMD's 4th generation APU chips place GPU cores on an equal footing with CPU cores, eliminating performance bottleneck

AMD has officially launched its next generation of A-Series APUs codenamed Kaveri, introducing a new architecture with the CPU and GPU cores placed on an equal footing.

This promises to unlock the full parallel compute power of the GPU to accelerate certain applications, especially those involving media processing.

The Kaveri family initially consists of three 28nm chips - the A10-7850K, A10-7700K and A8-7600 - the first two of which were made available during the CES show last week, while the third is coming later in this quarter.

Kaveri, which was detailed by AMD last year, forms the 4th generation of AMD's accelerated processor unit (APU) chips. These combine CPU and GPU cores on a single chip, with the intention that software can take advantage of the relative strengths of the two types in a heterogeneous system architecture (HSA).

However, AMD's first generation of APUs delivered in 2011 largely just placed a discrete GPU straight onto the same silicon as the CPU. The CPU still had to copy data from main memory and feed it to the GPU, as if it were still a slave device sitting out on the PCI Express bus.

Kaveri finally eliminates this bottleneck, giving the GPU cores on the chip direct access to system memory via the same route as the CPU cores, which AMD dubs heterogeneous uniform memory access (hUMA).

In addition, the cores support what AMD calls heterogeneous queuing (hQ), which means that any core can schedule tasks for any other core, whether it is a CPU pushing work to a GPU or vice versa.

This allows software to take full advantage of the CPU and GPU working together for the first time, according to AMD, and could even enable algorithms that were not possible before now.

"The primary purpose of HSA is unlocking the compute power of the GPU," said Sasa Marinkovic, senior manager of product marketing at AMD.

Because of the GPU's origins in accelerating gaming graphics, this is also AMD's initial target for the Kaveri APU platform, boosting physics calculations in 3D games, for example. AMD said the chip is capable of running games with full 1080p HD resolution graphics at 30fps or higher.

However, the raw power accessible via the GPU cores is likely to prove useful for other applications such as scientific image processing and statistics, which may see Kaveri taking on Nvdia's GPU computing products.

The Kaveri family has been designed to have up to four of AMD's latest CPU cores, codenamed Steamroller, plus up to eight of its Graphics Core Next (GCN) compute cores based on its Hawaii GPU design. There is also a programmable digital signal processor (DSP) core and an enhanced version of AMD's on-chip Video Coding Engine (VCE).

Steamroller offers a 20 percent instruction per clock improvement over earlier cores, according to Marinkovic, while the Hawaii cores deliver a 50 percent improvement in GPU performance. Taken together, this makes a Kaveri APU capable of an impressive 856GFlops, AMD claimed.

However, while AMD regards the CPU and GPU compute cores as equals, this does not mean they can necessarily handle the same tasks, as Marinkovic conceded.

"If you look in Task Manager on a PC, you will still only see four CPU cores, as Windows does not recognise the GPU cores yet," he told V3.

To unlock the power of the GPU cores, developers will need to turn to tools such as the OpenCL framework, which is designed for programming heterogeneous platforms. AMD said that support for direct programming of the GPU via Java is "a work in progress" and set to be delivered in Java 9.

But it appears that while some developer tools allow the programmer to choose whether a CPU or a GPU core is best suited to handle a specific code fragment, others make this decision automatically when compiling the code, Marinkovic said.

Of the Kaveri APUs announced so far, the A10-7850K is the only one with a full complement of 4 CPU and 8 GPU cores. Clocked at a default 3.7GHz, this 95W chip costs $173.

The A10-7700K and A8-7600 both feature 4 CPU and 6 GPU cores, but while the former is another 95W chip clocked at 3.4GHz and priced $152, the latter is available in 65W or 45W versions, clocked at 3.3GHz and 3.1GHz, respectively, priced from $119.