AMD Prepares 32-Core Naples CPUs for 1P and 2P Servers: Coming in Q2
by Ian Cutress on March 7, 2017 10:15 AM EST- Posted in
- Enterprise
- CPUs
- AMD
- SoCs
- Enterprise CPUs
- Zen
- Naples
- Ryzen
For users keeping track of AMD’s rollout of its new Zen microarchitecture, stage one was the launch of Ryzen, its new desktop-oriented product line last week. Stage three is the APU launch, focusing mainly on mobile parts. In the middle is stage two, Naples, and arguably the meatier element to AMD’s Zen story.
A lot of fuss has been made about Ryzen and Zen, with AMD’s re-launch back into high-performance x86. If you go by column inches, the consumer-focused Ryzen platform is the one most talked about and many would argue, the most important. In our interview with Dr. Lisa Su, CEO of AMD, the launch of Ryzen was a big hurdle in that journey. However, in the next sentence, Dr. Su lists Naples as another big hurdle, and if you decide to spend some time with one of the regular technology industry analysts, they will tell you that Naples is where AMD’s biggest chunk of the pie is. Enterprise is where the money is.
So while the consumer product line gets columns, the enterprise product line gets profits and high margins. Launching an enterprise product that gains even a few points of market share from the very large blue incumbent can implement billions of dollars to the bottom line, as well as provided some innovation as there are now two big players on the field. One could argue there are three players, if you consider ARM holds a few niche areas, however one of the big barriers to ARM adoption, aside from the lack of a high-performance single-core, is the transition from x86 to ARM instruction sets, requiring a rewrite of code. If AMD can rejoin and a big player in x86 enterprise, it puts a small stop on some of ARMs ambitions and aims to take a big enough chunk into Intel.
With today’s announcement, AMD is setting the scene for its upcoming Naples platform. Naples will not be the official name of the product line, and as we discussed with Dr. Su, Opteron one option being debated internally at AMD as the product name. Nonetheless, Naples builds on Ryzen, using the same core design but implementing it in a big way.
The top end Naples processor will have a total of 32 cores, with simultaneous multi-threading (SMT), to give a total of 64 threads. This will be paired with eight channels of DDR4 memory, up to two DIMMs per channel for a total of 16 DIMMs, and altogether a single CPU will support 128 PCIe 3.0 lanes. Naples also qualifies as a system-on-a-chip (SoC), with a measure of internal IO for storage, USB and other things, and thus may be offered without a chipset.
Naples will be offered as either a single processor platform (1P), or a dual processor platform (2P). In dual processor mode, and thus a system with 64 cores and 128 threads, each processor will use 64 of its PCIe lanes as a communication bus between the processors as part of AMD’s Infinity Fabric. The Infinity Fabric uses a custom protocol over these lanes, but bandwidth is designed to be on the order of PCIe. As each core uses 64 PCIe lanes to talk to the other, this allows each of the CPUs to give 64 lanes to the rest of the system, totaling 128 PCIe 3.0 again.
On the memory side, with eight channels and two DIMMs per channel, AMD is stating that they officially support up to 2TB of DRAM per socket, making 4TB in a single server. The total memory bandwidth available to a single CPU clocks in at 170 GB/s.
While not specifically mentioned in the announcement today, we do know that Naples is not a single monolithic die on the order of 500mm2 or up. Naples uses four of AMD’s Zeppelin dies (the Ryzen dies) in a single package. With each Zeppelin die coming in at 195.2mm2, if it were a monolithic die, that means a total of 780mm2 of silicon, and around 19.2 billion transistors – which is far bigger than anything Global Foundries has ever produced, let alone tried at 14nm. During our interview with Dr. Su, we postulated that multi-die packages would be the way forward on future process nodes given the difficulty of creating these large imposing dies, and the response from Dr. Su indicated that this was a prominent direction to go in.
Each die provides two memory channels, which brings us up to eight channels in total. However, each die only has 16 PCIe 3.0 lanes (24 if you want to count PCH/NVMe), meaning that some form of mux/demux, PCIe switch, or accelerated interface is being used. This could be extra silicon on package, given AMD’s approach of a single die variant of its Zen design to this point.
Note that we’ve seen multi-die packages before in previous products from both AMD and Intel. Despite both companies playing with multi-die or 2.5D technology (AMD with Fury, Intel with EMIB), we are lead to believe that these CPUs are similar to previous multi-chip designs, however there is Infinity Fabric going through them. At what bandwidth, we do not know at this point. It is also pertinent to note that there is a lot of talk going around about the strength of AMD's Infinity Fabric, as well as how threads are manipulated within a silicon die itself, having two core complexes of four cores each. This is something we are investigating on the consumer side, but will likely be very relevant on the enterprise side as well.
In the land of benchmark numbers we can’t verify (yet), AMD showed demonstrations at the recent Ryzen Tech Day. The main demonstration was a sparse matrix calculation on a 3D-dataset for seismic analysis. In this test, solving a 15-diagonal matrix of 1 billion samples took 35 seconds on an Intel machine vs 18 seconds on an AMD machine (both machines using 44 cores and DDR4-1866). When allowed to use its full 64-cores and DDR4-2400 memory, AMD shaved another four seconds off. Again, we can’t verify these results, and it’s a single data point, but a diagonal matrix solver would be a suitable representation for an enterprise workload. We were told that the clock frequencies for each chip were at stock, however AMD did say that the Naples clocks were not yet finalized.
What we don’t know are power numbers, frequencies, processor lists, pricing, partners, segmentation, and all the meaty stuff. We expect AMD to offer a strong attack on the 1P/2P server markets, which is where 99% of the enterprise is focused, particularly where high-performance virtualization is needed, or storage. How Naples migrates into the workstation space is an unknown, but I hope it does. We’re working with AMD to secure samples for Johan and me in advance of the Q2 launch.
91 Comments
View All Comments
deltaFx2 - Tuesday, March 7, 2017 - link
@ DanNeely: Yes, but most of the server space is like that. The CPU is constantly 'waiting' on something: disk, n/w, other I/O, memory, etc. That's pretty much the point of a server: huge data sets, lots of memory, lots of I/O. Re. general CPU perf, Zen is extremely competitive with Intel (you have to recall that the frequency advantage that i7700 has will go away in server. Server clocks at 2-3GHz base), Where it clearly lacks is AVX-256, which is largely an HPC thing. Also, if you're memory bound, AVX-256 (or 512) won't help you anyway. You're limited by bandwidth.shing3232 - Tuesday, March 7, 2017 - link
If you talk about HPC, They could relied on Co-processor such as NV or AMD compute card for those application. I mean even Intel relied on their Intel Phi co-processor to build supercomputerlefty2 - Tuesday, March 7, 2017 - link
Also, I noticed from the reviews that Ryzen integer performance seems a bit weakEasyListening - Wednesday, March 15, 2017 - link
Which reviews? Because I saw a chart (http://wccftech.com/amd-ryzen-7-1800x-8-core-bench... showing that Ryzen's integer performance blows the competition away. I don't know about the floating point performance, but it's probably meh, due to the plan to offload math to the GPU. Also, the crazy number of PCIe lanes means it can link to anything using a non-proprietary interface, PCIe, and maybe even something like NVLINK and a Tesla cluster. I'm no expert. Also, the whole AVX-128 vs AVX-256 thing is kind of a red herring afaik, due to the fact that AVX-256 is very computationally expensive, and there were some thermal issues related to trying to do the whole 256 in one get-go, or something like that.iwod - Tuesday, March 7, 2017 - link
If Ryzen's Desktop prices is any indication, the era of cheap In-Memory computing is coming. No longer do you have to spend an ARM and Leg to get the CPU AND the Motherboard just because you need 512GB, and up to 2TB Ram, or 4TB Ram in 1U Case.Of coz that is assuming 128GB Ram will come ( announced in 2015 ).
bill.rookard - Tuesday, March 7, 2017 - link
Nifty stuff. What I'm really interested to see is how this not only scales upwards, but downwards as well. Truthfully, there are some workloads out there which don't require 32 physical cores. I'm thinking 2 core or 4 core (either with SMT) for small scale storage servers. If they can fit 32 physical cores on a single die @ <130w, have a nice dual/quad core for home NAS units with some support for ECC would be terrific. (Yes, I mean similar to FreeNAS w/ ZFS).Right now the only choice for small NAS builds are Xeons, and while you can get older Xeons used on the secondary market and fit them with ECC, trying to get some of the low power Xeons is difficult because they're stupidly expensive.
Is having a nominally affordable quad core ECC small MOBO server too much to ask? (apparently so if you can only pick Intel).
That being said, I did read that the memory controller on the Ryzen consumer CPU ---IS--- ECC enabled, but it has to be implemented at the motherboard level.
ddriver - Tuesday, March 7, 2017 - link
"Right now the only choice for small NAS builds are Xeons"Nah, those mediocre bricking atoms were created specifically for this purpose. The boards come with scores of SATA connectors, ECC is supported, on die high speed nics.
NAS is NOT server. I doubt zen will feature a native dual core, maybe in time as bad dies pile up they might launch a dual core version based on failed quads with disabled cores.
rustyshackelford - Tuesday, March 7, 2017 - link
Ryzen supports ECC.cygnus1 - Tuesday, March 7, 2017 - link
FYI, it's not a single 32 core die. It's a multi-chip package. From the article "Naples uses four of AMD’s Zeppelin dies (the Ryzen dies) in a single package." I wouldn't hold your breath though for a model with a low core count, because it would likely screw up other parts of the SOC. Cutting out dies would likely mean fewer memory controllers and fewer PCIe lanes, which may even mean a different socket would be needed. I would think you might end up seeing, way down the road, models with as low as 16 cores. But I doubt much lower than that because it would either mean they're having terrible yields and have a lot of junk dies to sell (which does not appear to be the case) or they'd have to disable cores on dies that work fine that they could otherwise sell for more money. Demand would have to incredible high for lower core count to get them to go down that road with this platform.Also, I'm pretty sure almost all AMD chips for many years have had ECC enabled in their memory controllers and just dependent on the motherboard to support it.
phoenix_rizzen - Tuesday, March 7, 2017 - link
Every AMD CPU from the original AthlonXP/MP has supported ECC RAM. There may have been a few 32-bit Duron/Sempron CPUs that didn't, but the 64-bit ones did. I don't think AMD makes a 64-bit CPU without ECC support.Not every motherboard supported ECC, sure. But every CPU was capable of supporting it.
Actually, when you get right down to it, every AMD CPU has been virtually identical in the features that were supported (across family/generational lines). The only things you really had to decide on were the number of cores, the frequency, and the TDP. Everything else was the same (NX, SVM, ECC, all the different extentions, etc).
It's one of the nicer things about choosing AMD systems, especially when compared to the hideously huge, complicated matrices needed to find an Intel CPU that supports all the features you think you may need at some point down the line, and cross-referencing that with what your local supplier has access to. We were a heavy user of AMD systems (desktop and server) for many years because of this. We started purchasing Xeon E5 systems last year for our VM hosting servers as the Opteron systems just haven't been keeping up. :(
Will be interesting to see what the system prices are like next year with Naples. Our Xeon days could be just a blip. :)