Stream HPC

OpenCL mini buying guide for X86

Developing with OpenCL is fun, if you like debugging. Having software with support for OpenCL is even more fun, because no debugging is needed. But what would be a good machine? Below is an overview of what kind of hardware you have to think about; it is not in-depth, but gives you enough information to make a decision in your local or online computer store.

Companies who want to build a cluster, contact us for information. Professional clusters need different hardware than described here.

CPU

OpenCL on the CPU is rather new; we still have the term GPGPU and not many software has optimised code. But this will certainly change the coming two years. Your PC will probably last that long at least.

Currently there is only one answer: Intel’s Sandy Bridge. Current AMD-CPUs don’t support OpenCL as well as their upcoming Fusion or a Sandy Bridge does. My advice: if you can, wait those three months to see if AMD’s awarded hybrid processors are actually as fast as they say.

Sandy Bridge is an extension of the i5 or i7. So if you had selected an i5 or i7 for your new machine, just buy the SandyBridge-version. They are actually often cheaper than their predecessors, but still more expensive than AMDs.

GPU

NVIDIA or AMD? It so depends on what you want to do. Prices are for US and the Netherlands for 2011-01-17; please let me know if you found better prices, because I think the price-difference for the NVIDIA-card is strange.

  • Do you are a developer? Buy them both. See multi-GPU below.
  • Do you use software that currently only supports CUDA and you need it now? Buy the latest and greatest NVIDIA.
  • Do you want to have the overall best performance for the lowest price? Buy an AMD RADEON HD6970 (see picture, around €300/$380). It has 2.0 GB of memory and can do 134.4 GB/s.
  • Do you want to have the overall best performance of NVIDIA? Buy a NVIDIA GTX 580 (around €480/$500). It has 1.5 GB of memory and can do 177.4 GB/s.
  • Do you focus on FFT? Go for NVIDIA, which is faster at FFT (and thus anti-aliasing and thus many game-benchmarks).
  • No focus on FFT? AMD is pretty fast at many other OpenCL-benchmarks, even with a lower memory-bandwidth.
  • Do you want a budget-solution? Any NVIDIA or AMD RADEON of 2010/2011 suffices to have some performance-increase.

What shouldn’t you buy?

  • NVIDIA TESLAs. For FFT (cards comparable to GTX 580) and usage of Infiniband (= higher bandwidth) they are top of the line, but they don’t have a good price/performance-ratio.
  • AMD RADEON 4xxx series. The OpenCL-support on this card is not really good.
  • Never ever buy today’s cards. The NVIDIA GTX 600-series are coming in Q2 2011 (after the GTX 590), RADEON HD 7000-serie Q4 2011 (after the HD 6990).

The above list is based on Sisoft benchmarks , a Xbit Labs benchmark, the local computer store and my own experience.

I want to emphasise that NVIDIA-GPUs might look faster because they are faster in FFT and have a higher memory-bandwidth, but that for OpenCL they are not always the fastest. So please don’t compare the cards by game-benchmarks, but wait until Februari for the OpenCL-benchmarks by Sisoft Sandra and Phoronix. Currently there is no way to say which graphics card is better. Just be happy you chose for OpenCL instead of CUDA, so your software can run fast on any future machine (after some optimisation).

Soon the NVIDIA FERMI will hit the market. Not sure if the price/performance-ratio will be better than their TESLAs, but we will see.

Multi-GPU

OpenCL supports the use of multiple cards by different brands, if you might want to know. Half a year ago I didn’t know either, but I was happily surprised by this good news.

If you need pure horsepower for existing software, buy the same type and brand. It’s the safer choice, since not all software will have multi-device support in exotic configurations.

If you need to optimise your algorithms for both NVIDIA and AMD (like most developers need to do), buy two different ones. Then you’ll have three different OpenCL-devices on-board to test all your algorithms. Make sure the thick videocards actually fit on the board; the videocards take two slots in space. This could be of influence when you want 3 cards on your board, so measure the pictures of your motherboard-of-choice carefully.

Motherboard

Make the maximum memory-speed the primary reason for choice, then select for other options. With upgrading in mind, also look at the maximum amount of memory; you might want to have 24 or 32 GB in the future.

You need dedicated PCIe 16x slots for each videocard you have ordered. The bad thing about PCIe is that it is a shared bus, so sending data to the videocard(s) is limited by the demands of other cards (and on-board devices). PCIe 2.x can do a massive 8 GB/s, but is still limited in comparison to the incredible bus-speeds you find on the graphics-cards. Look very close to the specifications if the PCIe 16x slots are shared, if you buy a motherboard with more than 2 slots.

Much faster PCIe 3.0 is expected Q3/Q4 2011, or new GPUs might use Infiniband like NVIDIA’s TESLA.

I don’t know if I should recommend to buy a motherboard with on-board graphics. At the one had you can use the expensive cards purely for OpenCL, but at the other hand you might get driver-issues. So if it’s already there or just costs you little more, take the chance and try it.

Memory

For working-without-irritation the needed amount of memory is the amount you have on your GPUs, plus a minimum of 2GB for your OS, debugger and other software. For 64 bits plus another 2 GB. So in case you want two videocards with 2GB each, you need 8GB for 64 bits OSes. Know you need a 64bit OS to have one program have reserved more than 3GB of memory. Under Linux the OS can address more than 3GB under 32 bit, but the limit per program stays 3GB.

Why so much memory? You need to handle the speed of OpenCL and thus be able to swap the complete video-memory at full speed and to buffer between the OpenCL-device and the hard-drive. You or your boss won’t regret the extra bucks spent on memory. You might have total different algorithms, so you don’t need pre- or post-processing of the data on the CPU. In that case, less memory might be sufficient. Just don’t forget the demanding debugger, which might cry for even more.

Hard drive

If you need to work with streaming data, know the bottleneck is the hard-drive (or network-connection). SSD can reach 500MB/s but can be more expensive in comparison to a normal huge-cache harddrive and some extra memory for buffer. New hard drives can do 250 MB/s. To double speed, use striping in Raid. We might get higher speeds, but know SATA 3.0 can peak at only 600 MB/s; also check this when selecting your raid-solution.

Don’t bother the CPU-usage for data-handling – the times as we had with IDE are over. But do watch out for cheap “soft” solutions, which do really everything on the CPU. Do bother the cache-size of the hard-drives and the actual write-speed (not burst). The burst-speed includes the cache and is therefore incomparable to the actual speed.

Power Supply Unit

Know you need loads of power to feed this baby when it’s peaking. For a system with 1 high-end videocards you need about 600 – 650 Watts. Per extra video-card add some 200 – 250 Watts (depending on the card). Want to make sure your system can bring it on with 3 video-cards? Find yourself at least a 1150 Watts SPU. If you want to be sure your system is stable at peaks, add around 75 extra Watts per videocard. You can add two SPUs together, where the second one feeds the extra videocards. Ask your local retailer for more information on the details of this setup, because it needs some caution.

Information found in the local computer-store and on places like Xbit Labs.

Cooling and Casing

The bad part for two or video-cards is that your almost forced to water-cooling or other expensive cooling systems to get the heat away. Best is to pay extra attention to the air-flow in the case; a gamer’s computer-case would help you a lot. If you’ve just built your computer, measure heat very closely and test extensively by heating up the system for increasing periods of time (make a plan, starting at about 0.1 seconds with one videocard). Know the limits of your hardware and have a red button on your test, to quit the test immediately. Or much, much better: have your software read in temperature-data and act on it.

StreamHPC’s choices

You see no brands here (except on the photo), because for this article I did not do a deep comparison of all brands. Be sure not to buy overclocked version, because a flipped pixel does not matter for graphics, but does matter for OpenCL-computations. Below are some configurations, StreamHPC would choose. There is no max-performance-system, because that would be the same as buying the most expensive champagne in a star-restaurant; so just spend the extra money on a yearly upgrade for a long-term high performance.

Budget

Intel i5 SandyBridge, 8 GB memory, AMD RADEON HD6950.

This will give enough power to see huge differences between normal code and OpenCL-code. If you are in doubt, be sure to spend some extra bucks on a motherboard with 3x PCIe-2.0 slots. Don’t skip this one to quickly, because this will do for most people!

Video-editor (using OpenCL-capable software)

Intel i7 SandyBridge, 16GB memory, 2 x AMD RADEON HD6970 or GTX 580.

Currently it is safest to buy NVIDIA because of the CUDA-legacy in much software. At the other side I expect a lot to change because of SandyBridge

OpenCL-Developer

Intel i7 SandyBridge, 8-16 GB memory, AMD RADEON HD6970 + GTX 580

Perfect to test all three current high-end OpenCL-devices in one machine. Great for comparison and for modifying your software to use both the OpenCL-capable CPU plus a dedicated videocard.

End 2011 OpenCL-Developer

AMD Bulldozer Fusion or Intel i9 SandyBridge, 24 GB memory, AMD RADEON HD7970 + GTX 680, Motherboard with PCIe 3.0.

Just to know a lot of speed is coming to you soon and make the difference between normal and OpenCL code even bigger.

Support your local schools by donating your old computer!