Huge Memory Bandwidth, but not for every Block

One highly intriguing aspect of the M1 Max, maybe less so for the M1 Pro, is the massive memory bandwidth that is available for the SoC.

Apple was keen to market their 400GB/s figure during the launch, but this number is so wild and out there that there’s just a lot of questions left open as to how the chip is able to take advantage of this kind of bandwidth, so it’s one of the first things to investigate.

Starting off with our memory latency tests, the new M1 Max changes system memory behaviour quite significantly compared to what we’ve seen on the M1. On the core and L2 side of things, there haven’t been any changes and we consequently don’t see much alterations in terms of the results – it’s still a 3.2GHz peak core with 128KB of L1D at 3 cycles load-load latencies, and a 12MB L2 cache.

Where things are quite different is when we enter the system cache, instead of 8MB, on the M1 Max it’s now 48MB large, and also a lot more noticeable in the latency graph. While being much larger, it’s also evidently slower than the M1 SLC – the exact figures here depend on access pattern, but even the linear chain access shows that data has to travel a longer distance than the M1 and corresponding A-chips.

DRAM latency, even though on paper is faster for the M1 Max in terms of frequency on bandwidth, goes up this generation. At a 128MB comparable test depth, the new chip is roughly 15ns slower. The larger SLCs, more complex chip fabric, as well as possible worse timings on the part of the new LPDDR5 memory all could add to the regression we’re seeing here. In practical terms, because the SLC is so much bigger this generation, workloads latencies should still be lower for the M1 Max due to the higher cache hit rates, so performance shouldn’t regress.

A lot of people in the HPC audience were extremely intrigued to see a chip with such massive bandwidth – not because they care about GPU or other offload engines of the SoC, but because the possibility of the CPUs being able to have access to such immense bandwidth, something that otherwise is only possible to achieve on larger server-class CPUs that cost a multitude of what the new MacBook Pros are sold at. It was also one of the first things I tested out – to see exactly just how much bandwidth the CPU cores have access to.

Unfortunately, the news here isn’t the best case-scenario that we hoped for, as the M1 Max isn’t able to fully saturate the SoC bandwidth from just the CPU side;

From a single core perspective, meaning from a single software thread, things are quite impressive for the chip, as it’s able to stress the memory fabric to up to 102GB/s. This is extremely impressive and outperforms any other design in the industry by multiple factors, we had already noted that the M1 chip was able to fully saturate its memory bandwidth with a single core and that the bottleneck had been on the DRAM itself. On the M1 Max, it seems that we’re hitting the limit of what a core can do – or more precisely, a limit to what the CPU cluster can do.

The little hump between 12MB and 64MB should be the SLC of 48MB in size, the reduction in BW at the 12MB figure signals that the core is somehow limited in bandwidth when evicting cache lines back to the upper memory system. Our test here consists of reading, modifying, and writing back cache lines, with a 1:1 R/W ratio.

Going from 1 core/threads to 2, what the system is actually doing is spreading the workload across the two performance clusters of the SoC, so both threads are on their own cluster and have full access to the 12MB of L2. The “hump” after 12MB reduces in size, ending earlier now at +24MB, which makes sense as the 48MB SLC is now shared amongst two cores. Bandwidth here increases to 186GB/s.

Adding a third thread there’s a bit of an imbalance across the clusters, DRAM bandwidth goes to 204GB/s, but a fourth thread lands us at 224GB/s and this appears to be the limit on the SoC fabric that the CPUs are able to achieve, as adding additional cores and threads beyond this point does not increase the bandwidth to DRAM at all. It’s only when the E-cores, which are in their own cluster, are added in, when the bandwidth is able to jump up again, to a maximum of 243GB/s.

While 243GB/s is massive, and overshadows any other design in the industry, it’s still quite far from the 409GB/s the chip is capable of. More importantly for the M1 Max, it’s only slightly higher than the 204GB/s limit of the M1 Pro, so from a CPU-only workload perspective, it doesn’t appear to make sense to get the Max if one is focused just on CPU bandwidth.

That begs the question, why does the M1 Max have such massive bandwidth? The GPU naturally comes to mind, however in my testing, I’ve had extreme trouble to find workloads that would stress the GPU sufficiently to take advantage of the available bandwidth. Granted, this is also an issue of lacking workloads, but for actual 3D rendering and benchmarks, I haven’t seen the GPU use more than 90GB/s (measured via system performance counters). While I’m sure there’s some productivity workload out there where the GPU is able to stretch its legs, we haven’t been able to identify them yet.

That leaves everything else which is on the SoC, media engine, NPU, and just workloads that would simply stress all parts of the chip at the same time. The new media engine on the M1 Pro and Max are now able to decode and encode ProRes RAW formats, the above clip is a 5K 12bit sample with a bitrate of 1.59Gbps, and the M1 Max is not only able to play it back in real-time, it’s able to do it at multiple times the speed, with seamless immediate seeking. Doing the same thing on my 5900X machine results in single-digit frames. The SoC DRAM bandwidth while seeking around was at around 40-50GB/s – I imagine that workloads that stress CPU, GPU, media engines all at the same time would be able to take advantage of the full system memory bandwidth, and allow the M1 Max to stretch its legs and differentiate itself more from the M1 Pro and other systems.

M1 Pro & M1 Max: Performance Laptop Chips Power Behaviour: No Real TDP, but Wide Range
POST A COMMENT

492 Comments

View All Comments

  • noone2 - Tuesday, October 26, 2021 - link

    The laptop will be worthless and insanely outdated by the time and SSD dies, making it irrelevant even if that was the case. Reply
  • flyingpants265 - Sunday, October 31, 2021 - link

    What an extremely dumb comment. Old Macbooks aren't worthless, they hold their value extremely well. If you use the Mac for what it's intended (video work) it's possible that you'll do damage to the drive.

    You really are an absolute idiot if you think there's an excuse for soldering SSDs. It's like welding in the suspension on your car because "the car will be worthless and outdated". No. They have a limited lifespan and need to be replaced when they die.
    Reply
  • varase - Wednesday, November 3, 2021 - link

    Then you take it in and have the logic board traded for a refurbished one, then restore your data.

    Anyone who doesn't take off-computer backups is an idiot and is deserves to lose data whether it sits on a HDD or SSD.

    All drives fail eventually. I have at least two backups of everything, including my ginormous disk array. And when your replaceable SSD dies, it will take everything with it too.
    Reply
  • coolfactor - Tuesday, October 26, 2021 - link

    I'm typing this on a 2013 MacBook Air. 8 years and going strong. You chose "3000 writes" to sound dramatic, but that's the low end of low-end SSDs, none of which are used in Macs. SSDs can be rated up to 100,000 writes, and Samsung even promotes some of theirs as lasting 10 years under heavy usage. So your argument is weak, sorry. Reply
  • AshlayW - Tuesday, October 26, 2021 - link

    Just as anecdotal as your emotional reply to defend your product/purchase decision. Look up Louis Rossmann on YT if you want to know what kind of company you are supporting. Reply
  • caribbeanblue - Saturday, October 30, 2021 - link

    Unfortunately, these MacBooks are the best laptops on the market. Repairability is only part of the story, and the repairability of a device isn't just about ease of repair that is enabled by hardware design choices, it's also about the company providing board-level schematics to 3rd party repair shops, so users can have access to cost friendly genuine repair. Apple does have a long way to go in that aspect, that is true, however they *definitely* should not be forced to not solder down their SSD or DRAM. Soldering down such components earn you big improvements in terms of performance, energy efficiency, and space savings. If you want a laptop with a socketed RAM & removable SSD, then that's fine, buy something else, but don't act like MacBooks don't have any selling points, you would be delusional for thinking that. Reply
  • varase - Wednesday, November 3, 2021 - link

    Louis Rossmann is a religious zealot.

    He's a repair gnome, not an innovator or designer.
    Reply
  • UnNameless - Wednesday, November 17, 2021 - link

    LR sadly became a jest! A joker filled with hate! I respected him a lot back in the days he mostly had content on repair stuff! I also agree with him about the Right to Repair and most of the issues regarding Mac stuff repairs! But his war for RtoR became a crusade and went nuts in the last year or so when he also started to bash Apple on software stuff and practically everything he can find awful! And I wouldn't have said nothing if he'd be an informant software guy, but he's an repair engineer, a good one and he should have sticked with that! From the fiascos with the Apple services and firewall whitelists, to the Apple OCSP so poorly misunderstood by him and even worse presented, to Apple hashing etc. Ever since he became a couch diva with that freaking cat, instead a shop repair guy...his true colors and hate towards just oozed so smoothly from his skin! Reply
  • RealAlexD - Monday, November 1, 2021 - link

    3000 writes are actually a pretty good durability for a pro consumer SSD. While it is true, that some SSDs are rated for up to 100k writes, those are SLC devices, which are not really used anymore outside of special cases (Samsung Z-NAND). Normal SDDs will either have TLC or QLC Flash cells (or maybe 2bit MLC, but even Samsung Pro SSDs are now TLC), which don't last nearly as long.
    The TLC SSD with the most writes I could find was a Seagate Nytro Write intensive Server SSD, than promises about 20k writes.

    Also conditions effect the number of writes a Flash cell will survive, here running warmer actually can increases lifetime. And the advertised durability is a worst case durability.
    Reply
  • UnNameless - Wednesday, November 17, 2021 - link

    I agree! Been using my iMac Pro from 2018 till present! Before I started earlier this year my little experiment with Chia plotting, which is known to burn up SSDs like nothing else, I had 99% lifetime drive left in my 1TB SSD. Even before I ran my experiment, I tried to search online to find what kind of nand flash the iMP uses but couldn't find any real concrete stuff as Apple has customs chips in those SSDs. So I took it for a spin and in the course of weeks I wrote in excess of 1PB of data! SSD lifetime dropped from 99% to 86%. If this scales linearly, I reckon you'd have to write in excess of 10PB of data on a 1TB SSD to bring it to critical levels or burn it completely! And I have never heard of anyone that does such a thing! Reply

Log in

Don't have an account? Sign up now