Caffe and Torch7 ported to AMD GPUs, MXnet WIP

Last week AMD released ports of Caffe, Torch and (work-in-progress) MXnet, so these frameworks now work on AMD GPUs. With the Radeon MI6, MI8 MI25 (25 TFLOPS half precision) to be released soonish, it’s ofcourse simply needed to have software run on these high end GPUs.

The ports have been announced in December. You see the MI25 is about 1.45x faster then the Titan XP. With the release of three frameworks, current GPUs can now be benchmarked and compared.

Especially the expected good performance/price ratio will make this very interesting, especially on large installations. Another slide discussed which frameworks will be ported: Caffe, TensorFlow, Torch7, MxNet, CNTK, Chainer and Theano.

This leaves HIP-ports of TensorFlow, CNTK, Chainer and Theano still be released.

About HIP

HIP is a subset of CUDA that works on modern AMD GCN GPUs. While you’re reading this, AMD engineers are working on progressing HIP. Specifying a subset makes it possible to split the focus areas of feature-adding and performance-improvement. You can read more about HIP in our last year’s blog post on HIP and its potential. With 3 frameworks being released simultaneously while the team also worked on many other projects, tells you see it’s powerful indeed.

StreamHPC is a proud service partner for AMD ROCm-services, which includes HIP. We can port your software to AMD hardware, and make it run at maximum performance – our code improvements have been proven to speed up both NVidia and AMD implementations, due to years of experience with GPU-coding.

Hardware and software requirements

In our blogpost about ROCm 1.5 you find all information on the driver-stack, including how to install it.

Current hardware support is:

GFX7: Radeon R9 290 4 GB, Radeon R9 290X 8 GB, Radeon R9 390 8 GB, Radeon R9 390X 8 GB, FirePro W9100 (16GB), FirePro S9150 (16 GB) and FirePro S9170 (32 GB).
GFX8: Radeon RX 480, Radeon RX 470, Radeon RX 460, Radeon R9 Nano, Radeon R9 Fury, Radeon R9 Fury X, Radeon Pro WX7100, Radeon Pro WX5100, Radeon Pro WX4100, and FirePro S9300 x2.

If you need your code to be benchmarked on AMD GPUs (daily), get in touch to learn more about our services.

Caffe-HIP

Caffe was developed at the Berkeley Vision and Learning Center (BVLC). Caffe is useful for performing image analysis (Convolutional Neural Networks, or CNNs) and regional analysis within images using convolutional neural networks (Regions with Convolutional Neural Networks, or RCNNs).

AMD has shown the Caffe-port at SC16 with focus on the time it took to port it with HIP. The original plan was to release it with ROCm 1.5, as required features had to be performance-optimized within ROCm.

Github of CUDA-version: https://github.com/BVLC/caffe
Github of HIP-version: https://github.com/ROCmSoftwarePlatform/hipCaffe
HIP build instructions: https://github.com/ROCmSoftwarePlatform/hipCaffe/blob/hip/README.ROCm.md
Request to add HIP to Caffe: https://github.com/BVLC/caffe/issues/5626

Torch7-HIP

Torch was originally developed at NYU, and is based upon the scripting language Lua, which was designed to be portable, fast, extensible, and easy to use in development. Lua was also designed to have an easy-to-use syntax, which is reflected by Torch’s syntactic ease of use. Torch features a large number of community-contributed packages, giving Torch a versatile range of support and functionality.

Github of CUDA-version: https://github.com/torch/torch7
Github of HIP-version: https://github.com/ROCmSoftwarePlatform/cutorch_hip
HIP build instructions: https://github.com/ROCmSoftwarePlatform/cutorch_hip/blob/hip/README.ROCm.md
Request to add HIP to Torch7: https://github.com/torch/torch7/issues/1031

MXnet-HIP – Work in progress

MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler and a graph optimizer, to automatically parallelize both symbolic and imperative operations on the fly, while optimizing for both execution and memory efficiency. It also adds a collection of blue prints and guidelines for building deep learning systems.

NB: This version is told to be work-in-progress.

Github of CUDA-version: https://github.com/dmlc/mxnet
Github of HIP-version: https://github.com/ROCmSoftwarePlatform/mxnet
HIP build instructions: N/A
Request to add HIP to MXnet: https://github.com/dmlc/mxnet/issues/6257

What’s next?

AMD has been releasing several libraries that work with hcc or HIP:

hipeigen: hcc Eigen library
hcRNG: hcc Random Number Generation library
hcBLAS: hcc BLAS library
hcFFT: hcc FFT library
cunn_hip: HIP CUDA Neural Network library
hipBLAS: HIP BLAS library

We can expect more software that depend on these libraries to be ported to AMD GPUs. Which do you think is next? Put your best bet in the comments.

Important is that the code gets integrated within the projects, so the three frameworks officially work on both NVIDIA and AMD. This can only be done when the maintainers know about these ports and understand there is demand. If you find the ports useful, start an issue in Github to show there is need.