Radeon Instinct Hardware: Polaris, Fiji, Vega

Diving deeper into matters, let’s talk about the Radeon Instinct cards themselves. The Instinct cards are for all practical purposes a successor (or spin-off) to AMD’s current FirePro S series cards, so if you are familiar with AMD’s hardware there, then you know what to expect. Passively cooled cards geared for large scale server installations, offered across a range of power and performance options.

As this is a new product line the Instinct cards don’t have any immediate predecessors in AMD’s FirePro S lineup, but unsurprisingly, AMD has structured their new family of server cards similar to how NVIDIA has structured their P4/P40/P100 lineup of deep learning cards. All told, AMD is announcing 3 cards today, all 3 which tap different AMD GPUs, and are (roughly) named after their expected performance levels.

AMD Radeon Instinct
  Instinct MI6 Instinct MI8 Instinct MI25
Memory Type 16GB GDDR5 4GB HBM "High Bandwidth Cache and Controller"
Memory Bandwidth 224GB/sec 512GB/sec ?
Single Precision (FP32) 5.7 TFLOPS 8.2 TFLOPS 12.5 TFLOPS
Half Precision (FP16) 5.7 TFLOPS 8.2 TFLOPS 25 TFLOPS
TDP <150W <175W <300W
Cooling Passive Passive
(SFF)
Passive
GPU Polaris 10 Fiji Vega
Manufacturing Process GloFo 14nm TSMC 28nm ?

Starting things off, we have the Radeon Instinct MI6. This is a Polaris 10 card analogous to the consumer RX 480. As Polaris doesn’t have much in the way of special capabilities for deep learning (more on this in a second), AMD is pitching the card as their baseline card for neural network inference (execution). At 5.7 TFLOPS (FP16 or FP32) it will draw under 150W, and while pricing for the family hasn’t been announced, I believe it’s a safe bet that as the baseline card the MI6 will offer the best performance per dollar across the Instinct family.

Meanwhile in an unexpected move, AMD will be keeping their 2015 Fiji GPU around for the second card, the Instinct MI8. This card is for all intents and purposes a rebranded Radeon R9 Nano, AMD’s power tuned Fiji card that has proven quite popular with their server customers. Within the Instinct lineup, it is essentially an unusual variant to the MI6, offering higher throughput and greatly increased memory bandwidth for only a small increase in power consumption, with the drawback of Fiji’s 4GB VRAM limitation. Since it offers better performance than the MI6 and is smaller to boot, I expect we’ll see AMD pitch the MI8 as a premium alterative for inference.

The MI6 and MI8 will be going up against NVIDIA’s P4 and P40 accelerators. AMD’s cards don’t directly line-up against the NVIDIA cards in power consumption or expected performance, so the competitive landscape is somewhat broad, but those are the cards AMD will need to dethrone in the inference landscape. One potential issue here that I’m waiting to see if and how AMD addresses closer to the launch of the Instinct family will be the lack of high-speeds modes for lower precision operations. The competing Tesla cards can process 8-bit integer (INT8) operations at up to 4x speed, something the MI6 and MI8 Instinct cards can’t do. INT8 is something of a special case, but if NVIDIA’s expectations for inferencing with INT8 come to pass, then it means AMD has to compete more strongly on price than performance.

Last, but certainly not least in the Instinct family is the most powerful card of them all, and arguably the cornerstone of what the family is meant to become: the MI25. This is based on AMD’s forthcoming Vega GPU family, and while AMD is not sharing much in the way of new details on Vega today, they are leaving no doubts that this is going to be a high performance card. The passively cooled card is rated for sub-300W operation, and based on AMD performance projections elsewhere, AMD makes it clear that they’re targeting 25 TFLOPS FP16 (12.5 TFLOPS FP32) performance.

Significantly, of the few things AMD is saying about Vega right now, is that they’re confirming that it supports packed math formats for FP16 operations. This is something that first appears in Sony’s Playstation 4 Pro, with a strong hint that it was a feature of a future AMD architecture, and now this has been confirmed.

With AMD pitching the MI25 as a training accelerator, offering a packed math mode for FP16 is critical to the product. Neural network training very rarely requires higher precision FP32 math, which is otherwise the default for GPUs. Instead, FP16 is suitably precise for a process that is inherently imprecise, and as a result offering a fast FP16 mode makes the card significantly faster at its intended task. Coupled with the already high throughput rates of GPUs due to their wide arrays of ALUs, and this is what makes GPUs so potent at neural network training.

As AMD’s sole training card, the MI25 will be going up against NVIDIA’s flagship accelerator, the Tesla P100. And as opposed to the inference cards, this has the potential to be a much closer fight. AMD has parity on packed instructions, with performance that on paper would exceed the P100. AMD has yet to fully unveil what Vega can do – we have no idea what “NCU” stands for or what AMD’s “high bandwidth cache and controller” are all about – but on the surface there’s the potential for the kind of knock-down fight at the top that makes for an interesting spectacle. And for AMD the stakes are huge; even if they can’t necessarily win, being able to price the MI25 even remotely close to the P100 would give them huge margins. More practically speaking, it means they could afford to significantly undercut NVIDIA in this space to capture market share while still making a tidy profit.

On a final note, while AMD isn’t commenting on the future of FirePro S or other server GPU products – so it’s not clear if Instinct will be their entire server GPU backbone or only part of it – it’s interesting to note that they are pointing out that one of the ways they intend to stand out from NVIDIA is to not restrict their virtualization support to certain cards.

In other words, if Instinct does end up being AMD’s sole line of server cards, then these cards will be fully capable of serving the virtualization market just as well as the deep learning markets.

AMD Announces Radeon Instinct: GPU Accelerators for Deep Learning Software, Servers, & Closing Thoughts
POST A COMMENT

39 Comments

View All Comments

  • CoD511 - Saturday, December 31, 2016 - link

    Well, what I find shocking is the P4 with a 5.5TFLOP rating at 50w/75w versions as the rated maximum power, not even using TDP to obfuscate the numbers. It's right near the output of the 1070 but the power numbers are just, what? If that's true and it may well be considering they're available products, I wonder how they've got that set up to draw so little power yet output so much or process at such speed. Reply
  • jjj - Monday, December 12, 2016 - link

    The math doesn't work like that at all.
    Additionally, we don't know the die size and the GPU die is not the only thing using power on a GPU AiB.
    What we do know is that it gets more at 300W rated TDP than Nvidia's P100.
    Reply
  • ddriver - Monday, December 12, 2016 - link

    "The math doesn't work like that at all." - neither do flamboyant statements devoid of substantiation. The numbers are exactly where I'd expect them to be based on rough estimates on the cost of implementing a more fine-grained execution engine. Reply
  • RussianSensation - Monday, December 12, 2016 - link

    Almost everyone has gotten this wrong this generation. It's not 2 node jumps because the 14nm GloFo and 16nm TSMC are really "20nm equivalent nodes." The 14nm/16nm at GloFo and TSMC are more marketing than a true representation.

    "Bottom line, lithographically, both 16nm and 14nm FinFET processes are still effectively offering a 20nm technology with double-patterning of lower-level metals and no triple or quad patterning."

    https://www.semiwiki.com/forum/content/1789-16nm-f...

    Intel's 14nm is far superior to the 14nm/16nm FinFET nodes offered by GloFo and TSMC at the moment.
    Reply
  • abufrejoval - Monday, December 19, 2016 - link

    Which is *exactly* why I find the rumor that AMD is licensing its GPUs to Chipzilla to replace Intel iGPUs so scary: With that Intel would be able to produce true HSA APUs with HBM2 and/or EDRAM which nobody else can match.

    Intel has given away more than 50% of silicon real-estate for years for free to starve off Nvidia and AMD (isn't that illegal silicon dumping?) and now they could be ripping the crown jewels off a starving AMD to crash NVidia where Knights Lansing failed.

    AMD having on-par CPU technology now is only going to pull some punch, when it's accompanied with a powerful GPU part in their APUs that Intel can't match and NVidia can't deliver.

    They license that to Intel, they are left with nothing to compete with.

    Perhaps Intel lured AMD by offering their foundries for dGPU, which would allow ATI to make a temporary return. I can't see Intel feeding snakes at their fab-bosom (or producing "Zenselves").

    At this point in the silicon end game, technology becomes a side show to politics and it's horribly fascinating to watch.
    Reply
  • cheshirster - Sunday, January 8, 2017 - link

    If Apple is a customer everything is possible. Reply
  • lobz - Tuesday, December 13, 2016 - link

    ddriver......

    do you have any idea what else is going on under the hood of that surprisingly big card? =}

    there could be a lot of things accumulating that add up to <300W, which is still lower then the P100's 10,6 TF @ 300W =}
    Reply
  • hoohoo - Wednesday, December 14, 2016 - link

    You're being hyperbolic.

    24.00 W/TF for Vega.
    21.34 W/TF for Fiji.

    12% higher power use for Vega. That's not really gutted.
    Reply
  • Haawser - Monday, December 12, 2016 - link

    PCIe P100 = 18.7TF of 16bit in 250W

    PCie Vega = ~24TF of 16bit in 300W

    In terms of perf/W Vega might get ~22% more perf for ~20% more power. So essentially they should be near as darnit the same. Except AMD will probably be cheaper, and because each card is more powerful, you'll be able to pack more compute into a given amount of rack space. Which is what the people who run multi-million $ HPC research machines will *really* be interested in, because that's kind of their job.
    Reply
  • Ktracho - Monday, December 12, 2016 - link

    Not all servers can handle providing 300 W to add in cards, so even if they are announced as 300 W cards, they may be limited to something closer to 250 W in actual deployments. Reply

Log in

Don't have an account? Sign up now