Stream HPC

Beyond function there is performance.

We are in the niche of GPGPU-computing, where GPUs are programmed to efficiently run scientific and large-scale simulations, AI training/inference and other mathematical compute-intensive software. As a recognized expert, customers from mostly US and Europe trust us to speed up their software.

Our projects range from several person-weeks to fix software performance problems, to several person-years to build extensive high performance software and libraries.

Join a growing list of companies that trust us with designing and building their core software with performance in mind.

  •   We work with all the technologies that perform  

    • "Khronos contracted Stream to implement the OpenCL 2.2 conformance test suite.
      Not only was their delivered work of excellent quality, but their insights and attention
      to detail has helped improve the overall OpenCL specification and ecosystem.
      Khronos values Stream’s continued involvement in the OpenCL working group"

Neil Trevett, OpenCL Chair and Khronos President

  • “Stream is an elite, dependable and unique development outfit.
    We utilized Stream to achieve a ~40,000x speedup for our quantum simulation software.
    If your project demands ultra fast design and robust implementation, work with them.”

Jordan Ash, CEO Noospheer

  • “The efforts StreamHPC put into accelerating pyPaSWAS are very much appreciated.
    The speed-up has a direct effect on our ability to run even larger data sets.
    Also, StreamHPC shows that having computer science engineers on board life science
    projects is of great importance”

Sven Warris, WUR, PaSWAS project lead

  • "In 2014, we invented a complex and expensive algorithm which was the first in
    its kind to locate and recognize objects in digital photos. This particular algorithm
    was already accelerated using SSE-instructions. However, our application was to
    process an incoming firehose of tens of thousands of user generated photos per
    minute. Stream delivered optimized code in three weeks of time with full
    documentation. The code ran 20 times faster on a high-end GTX than an 8-core Xeon.
    "

Harro Stokman, CEO Euvision Technologies

  • “(Stream’s) faster software allows us to run many more process simulations which
    has a large impact on the role of our models in our product development work.“

Tata Steel Research & Development

.tp-caption.boxshadow,.boxshadow{font-size:32px;line-height:43px;text-decoration:none;background-color:transparent;box-shadow:rgba(0,0,0,0.5) 0px 0px 20px;border-width:0px;border-color:rgb(34,34,34);border-style:none}.tp-caption.black,.black{font-size:20px;color:rgb(0,0,0);text-decoration:none;background-color:transparent;text-shadow:none;border-width:0px;border-color:rgb(0,0,0);border-style:none}

            /******************************************
                -   PREPARE PLACEHOLDER FOR SLIDER  -
            ******************************************/

            var setREVStartSize = function() {
                var tpopt = new Object();
                    tpopt.startwidth = 960;
                    tpopt.startheight = 350;
                    tpopt.container = jQuery('#rev_slider_1_1');
                    tpopt.fullScreen = "off";
                    tpopt.forceFullWidth="off";

                tpopt.container.closest(".rev_slider_wrapper").css({height:tpopt.container.height()});tpopt.width=parseInt(tpopt.container.width(),0);tpopt.height=parseInt(tpopt.container.height(),0);tpopt.bw=tpopt.width/tpopt.startwidth;tpopt.bh=tpopt.height/tpopt.startheight;if(tpopt.bh>tpopt.bw)tpopt.bh=tpopt.bw;if(tpopt.bh<tpopt.bw)tpopt.bw=tpopt.bh;if(tpopt.bw<tpopt.bh)tpopt.bh=tpopt.bw;if(tpopt.bh>1){tpopt.bw=1;tpopt.bh=1}if(tpopt.bw>1){tpopt.bw=1;tpopt.bh=1}tpopt.height=Math.round(tpopt.startheight*(tpopt.width/tpopt.startwidth));if(tpopt.height>tpopt.startheight&&tpopt.autoHeight!="on")tpopt.height=tpopt.startheight;if(tpopt.fullScreen=="on"){tpopt.height=tpopt.bw*tpopt.startheight;var cow=tpopt.container.parent().width();var coh=jQuery(window).height();if(tpopt.fullScreenOffsetContainer!=undefined){try{var offcontainers=tpopt.fullScreenOffsetContainer.split(",");jQuery.each(offcontainers,function(e,t){coh=coh-jQuery(t).outerHeight(true);if(coh<tpopt.minFullScreenHeight)coh=tpopt.minFullScreenHeight})}catch(e){}}tpopt.container.parent().height(coh);tpopt.container.height(coh);tpopt.container.closest(".rev_slider_wrapper").height(coh);tpopt.container.closest(".forcefullwidth_wrapper_tp_banner").find(".tp-fullwidth-forcer").height(coh);tpopt.container.css({height:"100%"});tpopt.height=coh;}else{tpopt.container.height(tpopt.height);tpopt.container.closest(".rev_slider_wrapper").height(tpopt.height);tpopt.container.closest(".forcefullwidth_wrapper_tp_banner").find(".tp-fullwidth-forcer").height(tpopt.height);}
            };

            /* CALL PLACEHOLDER */
            setREVStartSize();

            var tpj=jQuery;
            tpj.noConflict();
            var revapi1;

            tpj(document).ready(function() {

            if(tpj('#rev_slider_1_1').revolution == undefined){
                revslider_showDoubleJqueryError('#rev_slider_1_1');
            }else{
               revapi1 = tpj('#rev_slider_1_1').show().revolution(
                {   
                                            dottedOverlay:"none",
                    delay:6000,
                    startwidth:960,
                    startheight:350,
                    hideThumbs:200,

                    thumbWidth:100,
                    thumbHeight:50,
                    thumbAmount:5,

                    simplifyAll:"off",

                    navigationType:"bullet",
                    navigationArrows:"none",
                    navigationStyle:"round",

                    touchenabled:"on",
                    onHoverStop:"on",
                    nextSlideOnWindowFocus:"off",

                    swipe_threshold: 75,
                    swipe_min_touches: 1,
                    drag_block_vertical: false,

                    keyboardNavigation:"off",

                    navigationHAlign:"center",
                    navigationVAlign:"bottom",
                    navigationHOffset:0,
                    navigationVOffset:20,

                    soloArrowLeftHalign:"left",
                    soloArrowLeftValign:"center",
                    soloArrowLeftHOffset:20,
                    soloArrowLeftVOffset:0,

                    soloArrowRightHalign:"right",
                    soloArrowRightValign:"center",
                    soloArrowRightHOffset:20,
                    soloArrowRightVOffset:0,

                    shadow:1,
                    fullWidth:"on",
                    fullScreen:"off",

                                            spinner:"spinner2",

                    stopLoop:"off",
                    stopAfterLoops:-1,
                    stopAtSlide:-1,

                    shuffle:"off",

                    autoHeight:"off",
                    forceFullWidth:"off",

                    hideThumbsOnMobile:"off",
                    hideNavDelayOnMobile:1500,
                    hideBulletsOnMobile:"off",
                    hideArrowsOnMobile:"off",
                    hideThumbsUnderResolution:0,

                                            hideSliderAtLimit:0,
                    hideCaptionAtLimit:0,
                    hideAllCaptionAtLilmit:0,
                    startWithSlide:0                    });

                                }
            }); /*ready*/

#rev_slider_1_1_wrapper .tp-loader.spinner2{ background-color: #FFFFFF !important; }

 

A selection of our customers

We have helped many companies become competitive, and cannot mention them here as of today. Below are public examples.

RocRAND. The world’s fastest random number generator is built for AMD GPUs, and it’s open source. With random numbers generated at several hundreds of gigabytes per second, the library makes it possible to speed up existing code numerous times. The code is faster than Nvidia’s cuRAND and is therefore the preferred library to be used on any high-end GPU.

RocPRIM. A version of CUB optimised for AMD GPUs and is fully open source. This enables software like Tensorflow to run on AMD hardware at full performance.

OpenCL 2.2 test-suite. When a hardware-company wants to have OpenCL 2.2 on their processor, they need to use a large test-suite to test their drivers and device. We made that update, which was a big change from 2.1 because of the addition of C++ kernels. We hope to see more devices support OpenCL 2.2 and find the new test suite to be complete and correct.

GROMACS does soft matter simulations on molecular scale

We ported GROMACS to OpenCL and optimised the code for usage with AMD FirePro accelerators. This resulted in code that is as fast with CUDA. Gromacs is used world-wide by over 5000 research centers, from simulating molecular docking to examining the hydrogen bonds in a falling water drop. Read more…

stanford_chemistry_logo

For the university of Stanford we optimised a part of TeraChem, a general purpose quantum chemistry software designed to run on NVIDIA GPU architectures. Our work resulted in adding an extra 70% performance to the already optimised CUDA-code.

UniOfManchesterLogo.svg

For the University of Manchester we got a large speedup with UNIFAC when going from OpenMP code to optimised OpenCL. Where OpenMP could get the single threaded code down to about 8 seconds, we brought it down to 0.062 seconds. Read more…

Memorial Sloan-Kettering Cancer Center-logo

We helped the Memorial Sloan-Kettering Cancer Center with improving a tool they used daily. Where it previously took one hour, it now takes just two minutes – a speed-up of 30x. Their productivity rose, as they did not need to wait for results so long anymore and could get more done without buying new computers.

Success stories

Want to read more what we did? Read about work we do.

Our customers did not want to hire another team, as they wanted the code to be fast the first time.

Technologies we work with

CUDA

HSA

OpenMP

OpenACC

ROCm