Purley Mark Two: Cascade Lake-SP

On the processor front, the on-paper hardware specifications of Cascade Lake Xeons offer no surprises, mainly because the stock design is identical to Skylake Xeons. Users will be offered up to 28 cores with hyperthreading, the same levels of cache, the same UPI connectivity, the same number of PCIe lanes, the same number of channels of memory, and the same maximum supported frequency of memory.

Questions still to be answered will be if the XCC/HCC/LCC silicon dies, from which the processor stack will come, will be the same. There is also no information about memory capacity limitations.

What Intel is saying on this slide however is in the second bullet point:

  • Process tuning, frequency push, targeted performance improvements

We believe this is a tie-in to Intel improving its 14nm process further, tuning it for voltage and frequency, or a better binning. At this point Intel has not stated if Cascade Lake is using the ‘14++’ process node, to use Intel’s own naming scheme, although we expect it to be the case. We suspect that Intel might drop the +++ naming scheme altogether, if this isn’t disclosed closer to the time. However a drive to 10% better frequency at the same voltage would be warmly welcomed.

Where some of the performance will come from is in the new deep learning instructions, as well as the support for Optane DIMMs.

AVX-512 VNNI Instructions for Deep Learning

The world of AVX-512 instruction support is completely confusing. Different processors and different families support various sets of instructions, and it is hard to keep track of them all, let alone code for them. Luckily for Intel (and others), companies that invest into deep learning tend to focus on one particular set of microarchitectures for their work. As a result, Intel has been working with software developers to optimize code paths for Xeon Scalable systems. In fact, Intel is claiming to have already secured a 5.4x increase in inference throughput on Caffe / ResNet50 since the launch of Skylake – partially though code and parallelism optimizations, but partially though reduced precision and multiple concurrent instances also.

With VNNI, or Vector Neural Network Instructions, Intel expects to double its neural network performance with Cascade Lake. Behind VNNI are two key instructions that can optimized and decoded, reducing work:

Both instructions aim to reduce the number of required manipulations within inner convulsion loops for neural networks.

VPDPWSSD, the INT16 version of the two instructions, fuses two INT16 instructions and uses a third INT32 constant to replace PMADDWD and VPADD math that current AVX-512 would use:

VPDPBUSD does a similar thing, but takes it one stage back, using INT8 inputs to reduce a three-instruction path into a one-instruction implementation:

The key part from Intel here is that with the right data-set, these two instructions will improve the number of elements processed per cycle by 2x and 3x respectively.

Framework and Library for these new instructions will be part of Caffe, mxnet, TensorFlow, and Intel’s MKL-DNN.

Protecting for Spectre, Meltdown, and Similar Attacks Making the Most of Memory
Comments Locked


View All Comments

  • Yojimbo - Sunday, August 19, 2018 - link

    It's also very big in China.
  • abufrejoval - Monday, August 20, 2018 - link

    What about the Control Flow Integrity extensions announced in 2016? Are they mentioned anywhere? Does anyone know what AMD is doing about them?
  • HStewart - Monday, August 20, 2018 - link

    "Does anyone know what AMD is doing about them?"

    That is real good question and thanks for the Link? - I would be curious about what CPU's will have these extensions. My guess initially it will be part of hardware / software changes mention here. From a quick look at document , it looks primary aim at OS developers especially with mention of task switch.

    As for your original question, My guess is that AMD will be adding similar instruction in a future - it just makes it hard for OS developers - unless AMD Licenses the additions so it has similar instructions.
  • iAPX - Monday, August 20, 2018 - link

    Why and how "Mitigation" becomes "Fix"?!?

    Intel is clear about their lack of Fix again, but only mitigations. In the article it's not the same story.
  • moozooh - Monday, August 20, 2018 - link

    The main reason is likely that CPU R&D cycle normally takes some 2+ years, and the Spectre/Meltdown vulnerabilities were only fully understood sometime midways during Cascade Lake's hardware design cycle where only minor architectural changes could be made. I believe you can only expect full-scale fix in microarchitectures that entered its initial development phases in late 2017 or so. Which means they won't enter the market until mid-2019 at the earliest. So, come back for Ice Lake and its sister families I guess.
  • HStewart - Monday, August 20, 2018 - link

    I believe the difference is "Mitigation" is actually done in software or microcode downloaded to chip - but "Fix" is actually a change in actual hardware
  • edzieba - Monday, August 20, 2018 - link

    There is no "fix" without removing Speculative Execution just like there is no hardware "fix" for buffer over/underruns. The fix is in software design, the mitigation is in hardware to compensate for the change in software design.
  • HStewart - Monday, August 20, 2018 - link

    I believe that with Spectra 1 - this appears not to required CPU change and others similar to this - Spectra 2 requires no instructions and Spectra 3 requires no hardware changes

    Not sure about other changes but keep in mind kernal can prevent rogue programs for cause problems but will slow system down by using io protection technique - only thing is what kind of perormance hurt is does this cause - this can fix in hardware with new hardware that the OS

    To me as OS developer in the late 80's and early 90's, my knowledge now maybe limited - but cause of change in job - but it pretty sick that OS developers and CPU have spend resource to correct issues for situation with hackers and such to exploit hardware. Keep in mind these problems don't just effect Intel but also include ARM and AMD cpus.
  • HStewart - Monday, August 20, 2018 - link

    A link for the top part - of course part of this is just my opinion based on my previous experience as OS developer


    I did notice a real CPU defect in IBM 486SLC - when switching from 286 protected mode 386 protected mode te IBM 486SLC had a defect according to IBM that the cache was inverted causing exception and hard lock as occur. I believe this was track down in early 90's
  • Elstar - Monday, August 20, 2018 - link

    "Variant 1 is still to be tackled at the OS level". I wish Intel were more clear about this. They clearly views variant 1 as a problem for any and all software to solve, not just OS/VMM software. The only thing magical about the OS/VMM is that they're more popular attack vectors.

Log in

Don't have an account? Sign up now