Ask the Experts: Heterogeneous and GPU Compute with AMD’s Manju Hegde
by Anand Lal Shimpi on May 14, 2012 3:46 PM EST- Posted in
- CPUs
- AMD
- Ask the Experts
- GPUs
AMD’s Manju Hegde is one of the rare folks I get to interact with who has an extensive background working at both AMD and NVIDIA. He was one of the co-founders and CEO of Ageia, a company that originally tried to bring higher quality physics simulation to desktop PCs in the mid-2000s. In 2008, NVIDIA acquired Ageia and Manju went along, becoming NVIDIA’s VP of CUDA Technical Marketing. The CUDA fit was a natural one for Manju as he spent the previous three years working on non-graphics workloads for highly parallel processors. Two years later, Manju made his way to AMD to continue his vision for heterogeneous compute work on GPUs. His current role is as the Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.
Given what we know about the new AMD and its goal of building a Heterogeneous Systems Architecture (HSA), Manju’s position is quite important. For those of you who don’t remember back to AMD’s 2012 Financial Analyst Day, the formalized AMD strategy is to exploit its GPU advantages on the APU front in as many markets as possible. AMD has a significant GPU performance advantage compared to Intel, but in order to capitalize on that it needs developer support for heterogeneous compute. A major struggle everyone in the GPGPU space faced was enabling applications that took advantage of the incredible horsepower these processors offered. With AMD’s strategy closely married to doing more (but not all, hence the heterogeneous prefix) compute on the GPU, it needs to succeed where others have failed.
The hardware strategy is clear: don’t just build discrete CPUs and GPUs, but instead transition to APUs. This is nothing new as both AMD and Intel were headed in this direction for years. Where AMD sets itself apart is that it is will to dedicate more transistors to the GPU than Intel. The CPU and GPU are treated almost as equal class citizens on AMD APUs, at least when it comes to die area.
The software strategy is what AMD is working on now. AMD’s Fusion12 Developer Summit (AFDS), in its second year, is where developers can go to learn more about AMD’s heterogeneous compute platform and strategy. Why would a developer attend? AMD argues that the speedups offered by heterogeneous compute can be substantial enough that they could enable new features, usage models or experiences that wouldn’t otherwise be possible. In other words, taking advantage of heterogeneous compute can enable differentiation for a developer.
That brings us to today. In advance of this year’s AFDS, Manju has agreed to directly answer your questions about heterogeneous compute, where the industry is headed and anything else AMD will be covering at AFDS. Manju has a BS in Electrical Engineering (IIT, Bombay) and a PhD in Computer Information and Control Engineering (UMich, Ann Arbor) so make the questions as tough as you can. He'll be answering them on May 21st so keep the submissions coming.
101 Comments
View All Comments
BenchPress - Thursday, May 17, 2012 - link
That's an interesting thought experiment, but I don't think AMD should be hoping for some miracle technology to save HSA. On-chip optical interconnects won't be viable for the consumer market for at least another 10 years, and heterogeneous computing will run into bandwidth walls long before that. And it remains to be seen whether the bandwidth offered by optical technology will make the whole issue go away or just postpone it a little longer.Secondly, the issue isn't just bandwidth, but also latency. Light travels only slightly faster than an electrical signal in copper, and that's without accounting for transmitter and receiver latency. So while a homogeneous CPU core can switch between scalar code and vector code from one cycle to the next, for a heterogeneous architecture it still takes a lot of time to send some data over and signal the processing.
BenchPress - Monday, May 14, 2012 - link
I don't think HSA is going to work. With Haswell we'll have AVX2 which brings key GPU technology right into the CPU cores. And the CPU is way more suitable for generic computing anyway thanks to its large caches and out-of-order execution. With AVX2 there's also no overhead from round-trip delays or bandwidth or APIs. Future extensions like AVX-1024 would totally eradicate the chances of the GPU ever becoming superior at general purpose computing without sacrificing a lot of graphics performance.MrSpadge - Monday, May 14, 2012 - link
Think big: there could be a couple of "AVX-1024" FPUs.. which could just as well be used as shaders by the GPU. That's the true fusion.BenchPress - Monday, May 14, 2012 - link
HSA uses a specific binary format that is not compatible with VEX (the encoding format used by AVX2 and AVX which is an extension of x86). So it's not going to support AVX-1024.But yes, AVX-1024 could be used for shader processing. It's just not going to be heterogeneous. It's a homogeneous part of the CPU's micro-architecture and instruction set.
codedivine - Monday, May 14, 2012 - link
According to publically available information, HSA is also not tied to a specific binary format and will be JIT-compiled to the actual ISA.Given that AMD already supports AVX, FMA4 and will support FMA3 going forward, (i.e. most of the AVX functionality), I expect that they will support AVX through HSA just fine.
Please stop your misinformed posts.
BenchPress - Tuesday, May 15, 2012 - link
Duh, of course it can be JIT-compiled. But JIT-compilation doesn't solve the actual problem. We've had JIT-compiled throughput computing for many years and it got us nowhere...The real problem is heterogeneous computing itself. You just can't get good performance by moving work between generic computing units with different instruction sets.
It's going to be quite ironic when HSA actually runs better on an Intel CPU with AVX2.
Penti - Monday, May 14, 2012 - link
Actually Trinity/Piledriver uses a normal FMA3 AVX, when HNI will come into effect at AMD too is anyones guess. Some of the FMA3 will be compatible with Haswell and Piledriver. For the HSA virtual ISA you just need to tune the LLVM backend to the Intel processors or simply compile for Intel to begin with. It's not something tide to hardware any way. That is not all what HSA is though. HSAIL isn't really a toolkit or API either. Not yet at least. It won't really replace all the other tools.TeXWiller - Monday, May 14, 2012 - link
Heterogeneous computing is ultimately not about using graphics cores as vector processors but about the resource utilization of the whole chip assuming a parallel workload with a set of sequential sections. Without an open abstraction layer the developer would have less choice on those "wimpy" throughput cores and the additional accelerators in the system might be left underutilized.I'm personally expecting Intel and AMD to have a selection of wide and narrow cores on a same x86 chip, executing sequential and parallel sections respectively. Otherwise scaling would hit a concrete wall in the near future, even assuming a software with 95-97% executable in a parallel way.
BenchPress - Tuesday, May 15, 2012 - link
You're assuming that code can be strictly categorized as sequential or parallel. This is never the case. You're always losing performance when forcing code to run on either a scalar CPU or a parallel GPU.A CPU with AVX2 simply combines the best of both worlds. No need to move data around to process it by another core. Switch instantly between sequential and parallel code, without synchronization overhead.
TeXWiller - Tuesday, May 15, 2012 - link
I'm simply assuming that Amdahl's law is still in force in the future. It sounds like your are creating a false dilemma with the data moving argument. A heterogeneous model as it emerges in the future is precisely going to this direction apparently for Nvidia as well.