Recent Developments: OpenPOWER's Potential HPC Comeback

Those who suggested that IBM's scale out servers were just a half-hearted effort that would quickly get strangled by the desire to protect the high margin big iron servers could not have been more wrong. IBM just launched 3 new servers, and all of them are affordable scale out servers. IBM is now very aggressively going after the market it has (almost) completely lost to Intel's Xeon: HPC. At the same time IBM is emphasizing the determination to play an important role in the emerging "machine learning" and "Big Data" market.

The S822LC "Big Data" and S821LC use mature and proven – some would say "older" – technology: the "OpenPOWER version" of the POWER8 and NVIDIA's Tesla K80. There are some interesting new facts to discuss though. First of all, these servers are made by Supermicro, confirming the close relation between the two companies and that OpenPOWER is indeed "Open". Supermicro is the market leader in the HPC market, and the fact that Supermicro chose to invest in OpenPOWER is a promissing sign: IBM is on to something, it is not another "me too" effort.

Secondly, these servers use (registered) DDR4 RAM as opposed to DDR3 as found in servers like the S812LC and SL822. Since they are still communicating via the "Centaur" memory buffers, this will not give any tangible performance boost, but it means that the servers are making use of the most popular and thus cheapest server memory technology.

The 2U S822LC "Big Data" looks like a solid offering. Pricing starts at $5999 (one 3.3 GHz 8-core, 64 GB RAM, no GPU), but realistically a full equipped server (two 10-cores, one K80, 128 GB) is around $16000. If you do not need the GPU, a server with two 10-cores, 256 GB, 2x 10 GB and two 1 TB disks costs around $13341. The CPU inside is still the 190W TDP single chip 10-core (at 2.9-3.5 GHz boost) that we tested a while ago. There is also an 8-core (3.3 - 3.7 GHz boost) alternative.

The 1U S821LC starts at $5900. The 1U form factor limits the POWER8 to much lower power envelopes. The 8-core chip runs at 2.3 GHz (135W TDP), the 10-core is allowed to consume a greater 145W, but runs at a meager – for POWER8 standards – 2.1 GHz. We can imagine that this is indeed based upon the customer feedback of space constrained datacenters, as IBM claims. We feel however that it makes the S821LC server less attractive as one of the distinguishing features of the POWER8 is the high single threaded performance. The POWER8 was simply not designed to run inside a 1U server. On the other side of the coin, a 2.1 GHz 10-core might still be fast enough to feed the GPU with the necessary data in some HPC applications.

IBM's OpenPOWER efforts Future Visions: POWER8 with NVLink
POST A COMMENT

49 Comments

View All Comments

  • JohanAnandtech - Sunday, September 25, 2016 - link

    Thanks Jesper. Looks like I will have to spend even more time on that system :-). And indeed, out of the box performance is important if IBM ever wants to get a piece of the x86 market. Reply
  • luminarian - Thursday, September 15, 2016 - link

    It was my understanding that the SMT mode on the power8 could be changed. Depending on the type of work this would make a giant difference, especially with mysql/mariadb that are limited to 1 process/thread per connection.

    With databases the real winner would be with one that supports parallel queries, such as postgresql 9.6, db2, oracle, etc.

    Also yer bench mark very easily could be limiting the power8 if its not opening enough connections to fill out the number of threads that thing can handle, remember mysql/mariaDB are 1 process/thread per connection. Alot of database bench marks default to a small number of connections, this thing has 160 threads with the dual 10 core. I would suggest trying to run that same benchmark again but do it at the same time from multiple client machines. See if the bench takes a larger dip when a second client machine runs the same bench or if the bench shows similar figures(granted this might hit hd io limit on the power8 server).

    So yea, that and try SMT-2 and SMT-4 modes.
    Reply
  • JohanAnandtech - Friday, September 16, 2016 - link

    Hi, I tried SMT-4, throughput was about 25% worse: 11k instead 14k+. 95th perc response time was better: 3.7 ms. Reply
  • JohanAnandtech - Friday, September 16, 2016 - link

    updated the MySQL graphs with SMT-4 data. Our Spark tests gets worse with SMT-4 and that is also true for SPECjbb. Reply
  • luminarian - Friday, September 16, 2016 - link

    Awesome, Thanks for the response. Reply
  • Meteor2 - Friday, September 16, 2016 - link

    The HPC potential is awesome. You can really see why Oak Ridge chose POWER9 and Volta. Reply
  • Communism - Sunday, September 18, 2016 - link

    Pretty sure most of the reason for that is due to Intel blocking every attempt Nvidia makes at getting a high bandwidth interface bolted onto a Xeon.

    Given that one of the main reasons that Intel blocked Nvidia's chipset business way back in the day was to try to limit the ability of other companies bolting on high bandwidth accelerators onto Intel chips (Presumably to protect their own initiatives in that space).
    Reply
  • Klimax - Saturday, September 17, 2016 - link

    Not terribly impressive. You have to get SW to paly nice and spend time to fine tune it to outperform Intel and it will cost you in power and cooling. More like "yes, if you get quite bigger TDP you get bit more power". And it won't be terribly good in many cases. (Like public facing service where latency is critical)

    Maybe if you are in USA and can waste admins and devs time and waste a lot on cooling and electricity then maybe. Otherwise why bother...
    Reply
  • SarahKerrigan - Sunday, September 18, 2016 - link

    I don't see this as a bad result. This is a 22nm processor, over two years old, and it beats Haswell-EP (which is newer) on efficiency. Broadwell-EP is brand new, and P9 should come out well before the end of BDW-EP's lifecycle. Reply
  • Kevin G - Sunday, September 18, 2016 - link

    Some of the POWER9 chips will be out next year though is suspect that the scale-up models maybe an early 2018 part. Considering that those chips go into IBM's big iron Unix servers, they tend to launch a bit later than the low end models so it isn't game changing.

    The real question is when SkyLake-EP/EX will launch and in comparison to the scale-out POWER9 chips. I was expecting a first half of 2017 for the Intel parts but I have no reference as to when to expect the POWER9 SO chips. Thus there is a chance Intel can come out first.

    Intel also wants a quick transition to SkyLake-EP/EX as they unify those to lines to some extent and provide some major platform improvements. I'm thinking Broadwell-EP/EX will have a relatively short life span compared to Haswell-EP/EX. This mimics much of what happened on the desktop and the challenge to move to 14 nm.
    Reply

Log in

Don't have an account? Sign up now