X86 Systems-on-a-Chip and GPGPU

The System-on-a-chip (SoC) for X86 will be a revolution for GPGPU. Why? Because currently a big problem is transferring data from CPU-memory to GPU-memory and back, which will be solved with SoCs. Below you can read this architecture-target is very possible.

With AMD+ATI, Intel and its future high-end GPUs, and NVidia with the rumours around its X86-chips, we will certainly get changes in the field. If it is the way to go, what is probable?

Get both CPU and high-end GPU on 1 chip, separated memory
Techniques for sharing memory
Translating OpenCL from and to C on the fly

ARM-processors are combined with GPUs a lot of times, but they don’t have current support for a common shader-languages (read: OpenCL) to make GPGPU in reach. We’ve asked ourselves many times why ARM & friends are involved in OpenCL since the beginning, but still don’t have any public and promoted driver-support. More on ARM, once there is more news on multi-core ARM-CPUs or OpenCL drivers.

1: One chip for everything

The biggest problem with split CPU/GPU-functionality is the bus-speed between the two is limited. The higher this speed, the more useful GPGPU can be. The highest speeds are possible when the signal does not have to leave the chip and there are no concessions made to the architecture of the graphics-card, in other words: glueing CPU and GPU together, but leave the memory-buses the same.

Currently there is Intel’s Nehalem and AMD’s Fusion, but they use DDR3 for both GPU and CPU; this will not really unlock the GPGPU-possibilities of high-end GPUs. It seems these products were designed with lower costs in mind.

But the chances high-end GPUs will be integrated on the CPU is rising. Going to 32nm gives room for more functionality, such as GPUs. Other choices can be smaller chips, more cores and integrating functionality of the north/south-bridge of the motherboard. If GPU-cores can be turned off when not working optimally when being tested in the factory (just like they do with mult-core CPUs), integrating high-end GPU-cores will even become a save choice.

Another way it could go is using optical buses between the GPU and CPU. It’s unknown if it will really see mainstream markets soon enough.

2: Shared memory – new style

Some levels of cache and all memory should be easy accessible by both types of cores. Why? Because eventually you want to switch between CPU- and GPU-instructions continuously. CUDA has a nice feature already, which keeps objects synchronised between CPU and GPU; one step further is leaving out the need of synchronising.

The problem is that video-memory is accessed more parallel to provide higher data-speeds (GDDR5), so we don’t want to limit the GPU by attaching them to slower (=lower bandwidth) DDR3. Doing it the other way would then be the best solution: giving CPUs direct access to GDDR. There is always a probable option that a new type of (replaceable) memory will be used, which has a dual-bus by design.

The hard part is memory-protection; since now more devices get control to memory, the overhead of controlling/arranging the spots can increase enormously and might need a separate core for it – just like the Cell-processor. This need-for-control is a reason I don’t expect access to each other memory before there will be a fast bus between GPU and CPU, since then the access to GDDR via the GPU’s memory-manager will be much faster and maybe even fast enough.

3: Grown up software

If software would be able to easily select devices and use the same code for each device, then we’ve made a giant step forwards. Software has always been one step behind hardware; so when you do not develop such techniques, you just have to wait a while.

Translating OpenCL into normal C and back will be possible in all kinds of ways, once there is more acceptance of (and thus demand for) GPGPU. AMD’s OpenCL-implementation for CPUs is also a way to merge the fields of CPU and GPU. It’s hard to tell how these techniques will merge, but it will certainly happen. Think of situations that some instructions will be sent to the GPU by the OS even when the (OpenCL) programmer did not think of it. Or do you expect to be an ARM-processor integrated in a near-future CPU, when you write an OpenCL-kernel now?

See our article on the bright future of GPGPU to read more about it.

What’s next?

In case this is the way it goes, there will be a lot possible for both OpenCL and CUDA – depending on market demands. Some possibilities will be discussed in an upcoming article about FPGAs, but also let me hear what you think about X86-SoCs. Comment or send an e-mail.