Today we’re covering some news of the more unusual type, and that is a roadmap update from Ampere, and having a closer look what the company is planning in terms of architectural and microarchitectural choices of their upcoming next-generation server CPUs in 2022 and onwards.

For people not familiar with Ampere, the company was founded back in 2017 by former Intel president Renée James, notably built upon a group of former Intel engineers who had left along with her to the new adventure. Initially, the company had relied on IP and design talent from former AppliedMicro’s X-Gene CPUs and still supporting legacy products such as the eMAG line-up.

With Arm having starting a more emphasised focus on designing and releasing datacentre and enterprise CPU IP line-ups in the form of the new Neoverse core offerings a few years back, over the last year or so we had finally seen the fruits of these efforts in the form of the release of several implementations of the first generation Neoverse N1 server CPU cores products, such as Amazon’s Graviton2, and more importantly, Ampere’s “Altra Quicksilver” 80-core server CPU.

The Altra Q line-up, for which we reviewed the flagship Q80-33 SKU last winter, was inarguably one of the most impressive Arm server CPU executions in past years, with the chip being able to keep up or beat the best AMD and Intel had to offer, even extending that positioning against the latest generation Xeon and EPYC generation.

Ampere’s next generation "Mystique" Altra Max is the next product on the roadmap, and is targeted to be sampling in the next few months and released later this year. The design relies on the same first generation Arm Neoverse N1 cores, at the same maximum 250W TDP as a drop-in replacement on the same platform, however with an optimised implementation that now allows for up to 128 CPU cores – 60% more cores than the first iteration of Altra we have today, and double the amount of cores of competitor systems from AMD or Amazon’s Graviton2.

For the future for designs beyond the Altra Max, Ampere is promising that they will be continuing emphasis of what they consider “predictable performance” for workloads with scaling socket load, increasing core counts with a linear increase in performance, and what I found interesting as a metric, to continue to reduce power per core – something to keep in mind as we’re discussing the next big news today:

Replacing Neoverse with Full Custom Cores

Today’s big reveal comes in regard to the microarchitecture choices that Ampere is going to be using starting in their next generation 2022 “Siryn” design, successor to the Altra Max, and relates to the CPU IP being used:

Starting with Siryn, Ampere will be switching over from Arm’s Neoverse cores to their new in-house full custom CPU microarchitecture. This announcement admittedly caught us completely off-guard, as we had largely expected Ampere to continue to be using Arm’s Neoverse cores for the foreseeable future. The switch to a new full custom microarchitecture puts Ampere on a completely different trajectory than we had initially expected from the company.

In fact, Ampere explains that what the move towards a full custom microarchitecture core design was actually always the plan for the company since its inception, and their custom CPU design had been in the works for the past 3+ years.

In terms of background - the design team leading the effort is lead by Ampere’s CTO Atiq Bajwa, who is also acting as the chief architect on the project. Bajwa and the team surrounding him appear to be mostly comprised of high-profile ex-Intel engineers and veterans which had left the company along with Renée James in 2017, topped-off with talent from a slew of other companies in the industry who joined them in the effort. The pedigree and history of the team is marked by achievements such as working on Intel’s Haswell and Broadwell processors.

Ampere’s explanation and rationale for designing a full custom core from the ground up, is that they are claiming they are able to achieve better performance and better power efficiency in datacentre workloads compared to what Arm’s Neoverse “more general purpose” designs are able to achieve. This is quite an interesting claim to make, and contrasts Arm’s projections and goals for their Neoverse cores. The recent Neoverse V1 and N2 cores were unveiled in more detail last month and are claimed to achieve significant generational IPC gains.

For Ampere to relinquish the reliance on Arm’s next-gen cores, and instead to rely on their own design and actually go forward with that switch in the next-gen product, shows a sign of great confidence in their custom microarchitecture design – and at the same time one could interpret it as a sign of no confidence in Arm’s Neoverse IP and roadmap. This comes at a great juxtaposition to what others are doing in the industry: Marvell has stopped development of their own ThunderX CPU IP in favour of adopting Arm Neoverse cores. On the other hand, not specifically related to the cloud and server market, Qualcomm earlier this year have acquired Nuvia, and their rationale and explanation was similar to Ampere’s in that they’re claiming that the new in-house design capabilities offered performance that otherwise wouldn’t have been possible with Arm’s Cortex CPU IP.

In our talks with Jeff Wittich, Ampere’s Chief Product Officer, he explains that today’s announcement should hopefully help paint a better picture of where Ampere is heading as a company – whether they’d continue to be content on “just” being an Arm IP integrator, or if they had plans for more. Jeff was pretty clear that in a few years’ time they’re envisioning and aiming for Ampere to be a top CPU provider for the cloud market and major player in the industry.

In terms of technical details as to how Ampere’s CPU microarchitecture will be different in terms of approach and how and why they see it as a superior performer in the cloud, are questions to which we’ll have to be a bit more patient for hearing answers to. The company wouldn’t comment on the exact status of the Siryn design right now – on whether it’s been taped in or taped out yet, but they do retierate that they’re planning customer sampling in early 2022 in accordance to prior roadmap disclosures. By the tone of the discussions, it seems the design is mostly complete, and Ampere is doing the finishing touches on the whole SoC. Jeff mentioned that in due time, they also will be doing microarchitectural disclosures on the new core, explaining their design choices in things like front-end or back-end design, and why they see it as a better fit for the cloud market.

Altra Max later this year, more cloud customer disclosures

Beyond the longer-term >2022 plans, today’s roadmap updates also contained a few more performance claim reiterations of Ampere’s upcoming 128-core Altra Max product, which is planned to hit the market later in the second half of the year and customers being sampled in the next few months.

The “Mystique” code-named Altra Max design will be characterised in that it’s able to increase the core-count by 60% versus the current generation Altra design, all while remaining at and below the same 250W TDP. The performance slides here are showcasing comparisons and performance claims against what is by now the previous generation competitor products, Ampere here simply explains they haven’t been able to get their hands on more recent Milan or Ice Lake-SP hardware to test. Nevertheless, the relative positioning against the Altra Q80-30 and the EPYC 7742 would indicate that the new chip would easily surpass the performance of even AMD’s latest EPYC 7763.

In the slide, Ampere actually discloses the SKU model name being used for the comparison, which is the "Altra Max M128-30" – meaning for the first time we have confirmation that all 128 cores are running at up to 3GHz clock speed, which is impressive given that we’re supposed to be seeing the same TDP and power characteristics between it and the Q80-33. We’ll be verifying these figures in the next few months once we get to review the Altra Max.

Today’s announcement also comes with an update on Ampere’s customers. Oracle was notably one of the first Altra adopters, but today’s disclosure also includes a wider range of cloud providers, with big names such as ByteDance and Tencent Cloud, two of the biggest hyperscalers in China.

Microsoft in particular is a big addition to the customer list, and while Ampere’s Jeff Wittich couldn’t comment on whether Microsoft has other internal plans in the works, he said that today’s announcement should give more clarity around the rumours of the Redmond company working on Arm-based servers, reports of which had surfaced back in December. Microsoft’s Azure cloud service is only second to Amazon’s AWS in terms of size and scale, and the company onboarding Altra products is a massive win for Ampere.

Taking control of one’s own future

Today’s announcements by Ampere of them deploying their own microarchitecture in future products is a major change in the company’s prospects. The news admittedly took us by surprise, but in the grand scheme of things it makes a lot of sense given that the company aims to be a major industry player in the next few years – taking full control of one’s own product future is critical in terms of assuring that success.

While over the years we’ve seen many CPU design teams be disbanded, actually having a new player and microarchitecture pop up is a much welcome change to the industry. While the news is a blow to Arm’s Neoverse IP, the fact that Ampere continues to use the Arm architecture is a further encouragement and win for the Arm ecosystem.

Related Reading:

POST A COMMENT

160 Comments

View All Comments

  • mode_13h - Wednesday, May 26, 2021 - link

    > You can reduce the area significantly, eg. using a single 64-bit FMA pipe that
    > needs 2 cycles for 128-bit SIMD. That would work fine with code that needs
    > a bit of scalar floating point.

    Alright. I'll concede this point. Maybe one of the points of differentiation in their cores is less area devoted to FP. I doubt it, but we're into pure speculation, here. I certainly can't say you're wrong.

    Let's just hope that whatever they've got in the works is compelling and somehow meaningfully different from what ARM has announced and the other offerings their competitors will have on the market. I do want to see Ampere succeed and live on to mature into a more formidable player.
    Reply
  • ikjadoon - Thursday, May 20, 2021 - link

    Read it again: Arm releases a new uarch every year. Arm does not release (or simply cannot release due to internal deficiencies / failures) a Neoverse variant.

    uArch improvements are judged per-generation so it’s actually comparable with everyone else.

    You can skip Intel or AMD or Apple generations, too, and cherry-pick your way absurd “40% Gen over Gen improvement” numbers. 👌

    Nobody is making 40% per year ST gains. 💀

    If you can’t understand that, I’ll let you go. 🙏 Ampere is heavily MT-focused, so their goals are much more likely more cores / better core-to-core topologies / lower power.

    //

    Apple has been making “one-off” gains for the better part of a decade. It’s clear nobody else was apparently interested enough to even try.

    You don’t think Intel wants to make big cores? Is AMD some anti-cache fanatic? Nope. They simply did not feel they were necessary and so they reap what they sow. 🤷‍♂️
    Reply
  • Wilco1 - Thursday, May 20, 2021 - link

    Neoverse N1->N2 is 40% and N1->V1 is 50% gain in a single generation. You simply can't deny that N1->V1 didn't get 50% performance gain in 2 years. Nobody on here claimed that means 40% year on year or 50% every generation, you're just making that up and cherry pick your numbers.

    Apple has been consistently ~2 generations ahead of everybody else. The gap is not increasing, so we are talking about one-off gains. If you slap enough cache around a Cortex-X1, clock it high on TSMC 5nm, it will reduce the gap significantly. An Arm slide suggests 20-30% gain is easy.

    AMD/Intel cores are significantly larger than Neoverse cores. There is no doubt AMD's huge caches help a lot, but they are using ~3 times more silicon. Altra achieves incredible performance using a much smaller silicon budget. Future Arm servers could also move to chiplets and increase caches significantly. So again, that's a one-off gain.
    Reply
  • mode_13h - Friday, May 21, 2021 - link

    > cherry pick your numbers.

    Oof. You're one to talk!
    Reply
  • mode_13h - Friday, May 21, 2021 - link

    > Ampere Altra is proof of performance and efficiency leadership for a stock Arm core.

    N2 is projected to have *worse* efficiency than N1, to say nothing of the V1!

    I won't repeat what I posted below, but ARM could be running out of gas, on the efficiency front!
    Reply
  • Wilco1 - Friday, May 21, 2021 - link

    Biased much? If anything getting 40% IPC gain at pretty much the same efficiency is incredibly impressive. It's normal in the industry for higher IPC cores to lose efficiency. I'm not sure how you can say Arm is running out of gas when on the same process you have 128 Arm cores using less power than 64 cores in Milan... Reply
  • mode_13h - Friday, May 21, 2021 - link

    > Biased much?

    I'm just trying to take an honest look at the data. Maybe you should re-read that article's conclusion. It voices similar reservations about the N2's power consumption.

    The point is not a minor one. This is a major and disturbing break in ARM's trend of advancing perf/W, and it's in their product line which is supposed to balance efficiency as an equal priority.

    > I'm not sure how you can say Arm is running out of gas

    ...on the efficiency front! You win nothing by such blatant twisting of my words.
    Reply
  • Wilco1 - Saturday, May 22, 2021 - link

    Is it really honest? N2's perf/W on the same process is just 3.5% lower than N1. That's not a big deal, a "disturbing break in ARM's trend of advancing perf/W" or "ARM could be running out of gas, on the efficiency front".

    Now if the IPC gain was just 10% instead of 40% then you might have a point, but maintaining efficiency at such large IPC gains is extremely difficult and unheard of.
    Reply
  • mode_13h - Sunday, May 23, 2021 - link

    > N2's perf/W on the same process is just 3.5% lower than N1.

    Okay, I had thought it was at 5nm, which was used for their other projections. I now see that the efficiency slide says "(ISO process & configuration)". So, if most/all implementations use 5 nm, then it should improve on perf/W, which should keep it competitive.

    Well, I learned something I would've missed without this exchange, so thanks.

    > maintaining efficiency at such large IPC gains is extremely difficult and unheard of.

    Is it? Hasn't Apple shown good efficiency at even greater IPC?
    Reply
  • mode_13h - Friday, May 21, 2021 - link

    > Neoverse N1 was announced in February 2019 with implementations late 2019.
    > Neoverse V1 was just announced with first implementations likely later this year.
    > So that's 50% performance gain in 2 years.

    Which models and numbers you pick depends on your goal. If you just want to make ARM look as impressive as possible, then I think you're on the right track. However, what you've done is like comparing the machine learning performance of Comet Lake vs Ice Lake-SP. You could tout some insane year-on-year improvement, but that ignores the fact that they're in different product lines and were optimized (and priced) to do different sorts of things. Also, Comet Lake's micro-architecture is like 5 years old, even though it just launched last year.

    So, go ahead and bicker over technicalities, like which CPU started shipping in which year. However, if the point is to establish some sort of performance trendline, in order to try and predict what sorts of improvements *future* ARM cores might provide, you're just leading yourself astray.
    Reply

Log in

Don't have an account? Sign up now