Radeon Instinct Hardware: Polaris, Fiji, Vega

Diving deeper into matters, let’s talk about the Radeon Instinct cards themselves. The Instinct cards are for all practical purposes a successor (or spin-off) to AMD’s current FirePro S series cards, so if you are familiar with AMD’s hardware there, then you know what to expect. Passively cooled cards geared for large scale server installations, offered across a range of power and performance options.

As this is a new product line the Instinct cards don’t have any immediate predecessors in AMD’s FirePro S lineup, but unsurprisingly, AMD has structured their new family of server cards similar to how NVIDIA has structured their P4/P40/P100 lineup of deep learning cards. All told, AMD is announcing 3 cards today, all 3 which tap different AMD GPUs, and are (roughly) named after their expected performance levels.

AMD Radeon Instinct
  Instinct MI6 Instinct MI8 Instinct MI25
Memory Type 16GB GDDR5 4GB HBM "High Bandwidth Cache and Controller"
Memory Bandwidth 224GB/sec 512GB/sec ?
Single Precision (FP32) 5.7 TFLOPS 8.2 TFLOPS 12.5 TFLOPS
Half Precision (FP16) 5.7 TFLOPS 8.2 TFLOPS 25 TFLOPS
TDP <150W <175W <300W
Cooling Passive Passive
(SFF)
Passive
GPU Polaris 10 Fiji Vega
Manufacturing Process GloFo 14nm TSMC 28nm ?

Starting things off, we have the Radeon Instinct MI6. This is a Polaris 10 card analogous to the consumer RX 480. As Polaris doesn’t have much in the way of special capabilities for deep learning (more on this in a second), AMD is pitching the card as their baseline card for neural network inference (execution). At 5.7 TFLOPS (FP16 or FP32) it will draw under 150W, and while pricing for the family hasn’t been announced, I believe it’s a safe bet that as the baseline card the MI6 will offer the best performance per dollar across the Instinct family.

Meanwhile in an unexpected move, AMD will be keeping their 2015 Fiji GPU around for the second card, the Instinct MI8. This card is for all intents and purposes a rebranded Radeon R9 Nano, AMD’s power tuned Fiji card that has proven quite popular with their server customers. Within the Instinct lineup, it is essentially an unusual variant to the MI6, offering higher throughput and greatly increased memory bandwidth for only a small increase in power consumption, with the drawback of Fiji’s 4GB VRAM limitation. Since it offers better performance than the MI6 and is smaller to boot, I expect we’ll see AMD pitch the MI8 as a premium alterative for inference.

The MI6 and MI8 will be going up against NVIDIA’s P4 and P40 accelerators. AMD’s cards don’t directly line-up against the NVIDIA cards in power consumption or expected performance, so the competitive landscape is somewhat broad, but those are the cards AMD will need to dethrone in the inference landscape. One potential issue here that I’m waiting to see if and how AMD addresses closer to the launch of the Instinct family will be the lack of high-speeds modes for lower precision operations. The competing Tesla cards can process 8-bit integer (INT8) operations at up to 4x speed, something the MI6 and MI8 Instinct cards can’t do. INT8 is something of a special case, but if NVIDIA’s expectations for inferencing with INT8 come to pass, then it means AMD has to compete more strongly on price than performance.

Last, but certainly not least in the Instinct family is the most powerful card of them all, and arguably the cornerstone of what the family is meant to become: the MI25. This is based on AMD’s forthcoming Vega GPU family, and while AMD is not sharing much in the way of new details on Vega today, they are leaving no doubts that this is going to be a high performance card. The passively cooled card is rated for sub-300W operation, and based on AMD performance projections elsewhere, AMD makes it clear that they’re targeting 25 TFLOPS FP16 (12.5 TFLOPS FP32) performance.

Significantly, of the few things AMD is saying about Vega right now, is that they’re confirming that it supports packed math formats for FP16 operations. This is something that first appears in Sony’s Playstation 4 Pro, with a strong hint that it was a feature of a future AMD architecture, and now this has been confirmed.

With AMD pitching the MI25 as a training accelerator, offering a packed math mode for FP16 is critical to the product. Neural network training very rarely requires higher precision FP32 math, which is otherwise the default for GPUs. Instead, FP16 is suitably precise for a process that is inherently imprecise, and as a result offering a fast FP16 mode makes the card significantly faster at its intended task. Coupled with the already high throughput rates of GPUs due to their wide arrays of ALUs, and this is what makes GPUs so potent at neural network training.

As AMD’s sole training card, the MI25 will be going up against NVIDIA’s flagship accelerator, the Tesla P100. And as opposed to the inference cards, this has the potential to be a much closer fight. AMD has parity on packed instructions, with performance that on paper would exceed the P100. AMD has yet to fully unveil what Vega can do – we have no idea what “NCU” stands for or what AMD’s “high bandwidth cache and controller” are all about – but on the surface there’s the potential for the kind of knock-down fight at the top that makes for an interesting spectacle. And for AMD the stakes are huge; even if they can’t necessarily win, being able to price the MI25 even remotely close to the P100 would give them huge margins. More practically speaking, it means they could afford to significantly undercut NVIDIA in this space to capture market share while still making a tidy profit.

On a final note, while AMD isn’t commenting on the future of FirePro S or other server GPU products – so it’s not clear if Instinct will be their entire server GPU backbone or only part of it – it’s interesting to note that they are pointing out that one of the ways they intend to stand out from NVIDIA is to not restrict their virtualization support to certain cards.

In other words, if Instinct does end up being AMD’s sole line of server cards, then these cards will be fully capable of serving the virtualization market just as well as the deep learning markets.

AMD Announces Radeon Instinct: GPU Accelerators for Deep Learning Software, Servers, & Closing Thoughts
POST A COMMENT

39 Comments

View All Comments

  • TheinsanegamerN - Tuesday, December 13, 2016 - link

    the kind of servers these cards aim for should be able to handle the load. And odds are many will simply be buying new servers with new hardware, rather then buying new cards and putting them in old servers. Reply
  • Michael Bay - Monday, December 12, 2016 - link

    I`d like to know if AMD is sharing PR team with Seagate, or vice versa. Oh those product names. Reply
  • Yojimbo - Monday, December 12, 2016 - link

    This is probably going to force NVIDIA to rethink the current strategic product segmentation they've implemented by withholding packed FP16 support from the Titan X cards. These announced products from AMD don't really compete with the P100, I think, but they are appealing for anyone thinking of training neural networks on the Titan X or scaled out servers using the Titan X. The Volta-based iteration of the Titan X may need to include FP16 support, which may then force that support onto the Volta-based P40 and P4 replacements, as well.

    These AMD products are well too late to affect the Pascal generation cards, though. It takes a long time for a new product to be qualified for a large server and I'm guessing the middleware and framework support isn't really there either, and isn't likely to be up to snuff for a while.
    Reply
  • p1esk - Monday, December 12, 2016 - link

    Amen, brother. This separation of training/inference to run on different hardware pissed me off. I hope Nvidia gets a little bit of competition.
    On the other hand, maybe we will find a way to train networks with 8 bits precision, after all it's highly unlikely our biological neurons/synapses are that precise.
    Reply
  • Threska - Sunday, January 1, 2017 - link

    Analog is a different beast. Reply
  • Holliday75 - Monday, December 12, 2016 - link

    My portfolio likes this news and I am thrilled to see the lack of comments asking how many display ports this card has. Reply
  • BenSkywalker - Monday, December 12, 2016 - link

    What segment are they shooting for with these parts?

    The MI6 which is implied to be an inference targeted device has one quarter the performance at three times the power draw of the P4 for INT8, this doesn't look bad, it looks like an embarrassment to the industry. ~8.5% of the performance per watt of parts that have been shipping for a while now for a product we don't even have a launch date for?

    The MI8 doesn't have enough memory to do any data heavy workloads, it is too big and way too power hungry for its performance for inference, what exactly is this part any good for?

    MI25 without memory amounts, bandwidth and some useful performance numbers(TOPS) it's hard to gauge where this is going to fall. Maybe this could be useful as an entry level device if priced really cheap?

    Their software stack, well, AMD has a justly earned reputation of being a third tier, at best, software development house. The only hope they have it relying on the community, their problem is going to be despite the borderline vulgar levels of misinformation and propaganda to the contrary, this *IS* an established market with massive resources already devoted to it, and AMD is coming very late to the game and are going to try and woo resources that have been working with the competition for years already?

    Comments on high levels of bandwidth for large scale deployment are kind of quaint. Why are you comparing the AMD solutions to those of Intel for high end usage? The high end for this market is using Power/nVidia with NVLink and measuring bandwidth in TB/s, the segment you are talking about is, at best, mid tier.

    What's worse, from a useful information perspective, is your comments that AMD making their own CPUs is rare in this market. In terms of volume the most popular use case for deep learning is going to be paired with ARM processors for the next decade at least- a market that has many players already and nVidia is quickly pushing them out of the segment. The only real viable competition at this point seems likely to come from Intel and their upcoming Xeon Phi parts, which appear to be likely to ship roughly when AMD would be shipping these parts.

    Pretty much, everyone that matters in deep learning makes CPUs.
    Reply
  • p1esk - Monday, December 12, 2016 - link

    *Pretty much, everyone that matters in deep learning makes CPUs.*

    Sorry, what?
    Reply
  • BenSkywalker - Monday, December 12, 2016 - link

    Intel, nVidia, IBM and Qualcomm currently represent all of the major players- I know there are a bunch of FPGA and DSPs on the drawing board, but out of actual shipping solutions, the players all make their own CPUs.

    Obviously I'm talking about the hardware manufacturer side.
    Reply
  • Yojimbo - Monday, December 12, 2016 - link

    It'll be interesting to see what Graphcore's offering is like. Reply

Log in

Don't have an account? Sign up now