Several developments in terms of GPU encoding that will have a major impact on teams with significant video operations are worth noting.

Microsoft Azure launched N Series Server.  This offering has two significant advantages over the older G series offered by Amazon AWS.  1) It allows multiple GPU cards on a single VM instance. 2) It has better hardware specifications with more GPU cores and double the RAM with the newer Tesla series card.   We put some testing data on our site under the GPU acceleration tab of Brevity.   For workflows that benefit from GPU acceleration it takes about 16 VM CPU cores to equal the performance of a single GPU box.  On a monthly basis the GPU VM would be about half the cost to operate.

AWS updated their instance types to now include a new P type instance.  This is more targeted for AI and research type applications that need GPU acceleration. The recommendation is still to use G series for multimedia applications like encoding.   The G series has more CPU processing power than the N series. For encoding jobs that require a mix of GPU and CPU processing the G series would be a faster choice.

On the bare metal front NVIDIA has started to ship the QUADRO cards with Pascal architecture.  The card is a great option for enterprises with important workloads that can invest in some upfront CAPEX to boost output.  Beyond enterprise SLA and hardware manufacturing specifications the feature that sets this card apart from other GPU offerings is the sheer number of simultaneous operations the cards can handle. Dozens of encode tasks can be linked to a decode task.  For operations like creating ABR outputs where you decode from a source and encode to several versions this allows an entire file set to be created from a single decode process.  Even with 1080p content and resizing the output the encoding factor hovers around .25 real-time for an entire output package that could include for example a HLS encoding ladder.

Of note all of these discussions center on NVIDIA cards which use a helper API called CUDA.  Open CL is an alternative helper API that works with other GPUs from companies like AMD and Intel. In our testing for specific use cases around media workflow we saw an across the board advantage of about 20% for CUDA over Open CL.  To support desktop applications that target the Mac OS and non NVIDIA GPUs significant code development would need to take place to support the Apple Metal API and Open CL.  For Brevity the accelerated features run in a Docker that runs on AWS, Azure, Hypervisor or bare metal server. The acceleration occurs if NVIDIA hardware is detected.