Ask the Experts: Heterogeneous and GPU Compute with AMD’s Manju Hegde
by Anand Lal Shimpi on May 14, 2012 3:46 PM EST- Posted in
- CPUs
- AMD
- Ask the Experts
- GPUs
AMD’s Manju Hegde is one of the rare folks I get to interact with who has an extensive background working at both AMD and NVIDIA. He was one of the co-founders and CEO of Ageia, a company that originally tried to bring higher quality physics simulation to desktop PCs in the mid-2000s. In 2008, NVIDIA acquired Ageia and Manju went along, becoming NVIDIA’s VP of CUDA Technical Marketing. The CUDA fit was a natural one for Manju as he spent the previous three years working on non-graphics workloads for highly parallel processors. Two years later, Manju made his way to AMD to continue his vision for heterogeneous compute work on GPUs. His current role is as the Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.
Given what we know about the new AMD and its goal of building a Heterogeneous Systems Architecture (HSA), Manju’s position is quite important. For those of you who don’t remember back to AMD’s 2012 Financial Analyst Day, the formalized AMD strategy is to exploit its GPU advantages on the APU front in as many markets as possible. AMD has a significant GPU performance advantage compared to Intel, but in order to capitalize on that it needs developer support for heterogeneous compute. A major struggle everyone in the GPGPU space faced was enabling applications that took advantage of the incredible horsepower these processors offered. With AMD’s strategy closely married to doing more (but not all, hence the heterogeneous prefix) compute on the GPU, it needs to succeed where others have failed.
The hardware strategy is clear: don’t just build discrete CPUs and GPUs, but instead transition to APUs. This is nothing new as both AMD and Intel were headed in this direction for years. Where AMD sets itself apart is that it is will to dedicate more transistors to the GPU than Intel. The CPU and GPU are treated almost as equal class citizens on AMD APUs, at least when it comes to die area.
The software strategy is what AMD is working on now. AMD’s Fusion12 Developer Summit (AFDS), in its second year, is where developers can go to learn more about AMD’s heterogeneous compute platform and strategy. Why would a developer attend? AMD argues that the speedups offered by heterogeneous compute can be substantial enough that they could enable new features, usage models or experiences that wouldn’t otherwise be possible. In other words, taking advantage of heterogeneous compute can enable differentiation for a developer.
That brings us to today. In advance of this year’s AFDS, Manju has agreed to directly answer your questions about heterogeneous compute, where the industry is headed and anything else AMD will be covering at AFDS. Manju has a BS in Electrical Engineering (IIT, Bombay) and a PhD in Computer Information and Control Engineering (UMich, Ann Arbor) so make the questions as tough as you can. He'll be answering them on May 21st so keep the submissions coming.
101 Comments
View All Comments
BenchPress - Tuesday, May 15, 2012 - link
GPUs are very wasteful with bandwidth. They have very little cache space per thread, and so they're forced to store a lot of things in RAM and constantly read it and write things back.CPUs are way more efficient because they process threads much faster and hence they need fewer, resulting in high amounts of cache space per thread. This in turn gives it very high cache hit rates, which has lower latency, consumes less power, and offers higher net bandwidth.
In other words, higher RAM bandwidth for GPUs doesn't actually make them any better at extracting effective performance from it. Also, CPUs will still have DDR4 and beyond once required, while GPUs are already pushing the limits and will have to resort to bigger caches in the near future, effectively sacrificing computing density and becoming more like a CPU.
Last but not least, the APU is limited by the socket bandwidth, so it's GPU has no bandwidth advantage over the CPU.
DarkUltra - Tuesday, May 15, 2012 - link
1. WDDM 1.2 require preemtive multitasking, so the gpu should never be clogged up anymore. Threads will be swapped in and out very quickly.BenchPress - Tuesday, May 15, 2012 - link
What makes you think that switching contexts can be done quickly? There's way more register state to be stored/restored, buffers to be flushed, caches to be warmed, etc. than on a CPU.suty455 - Monday, May 14, 2012 - link
i just need to understand why AMD made such a poor job of the latest CPUs even with Win 8 they lag so far behind intel its crazy, is the unified approach ever going to allow AMD to leap the gap to intels processors, and what kind of influence do you have with the major software houses eg MS to get the unified processor used to its fullest extent to actually make a difference in real world usage and not just benchmarks?i ask as a confirmed AMD fan who frankly can no longer ignore the massive performance increase i can get from swapping to intel.
Jedibeeftrix - Monday, May 14, 2012 - link
1. When will AMD be able to demonstrate a real competence at GPU compute, vis-a-vis Nvidia and its CUDA platform, by having its GPU's able to properly function as a render source for Blender/CYCLES?2. What steps are necessary to get it there?
----------------------------
Blender (and its new CYCLES GPU renderer) is a poster-child for the GPU compute world.
It already runs on OpenCL, however, only properly on the CPU or via Nvidia CUDA.
Blender themselves are already trying to make OpenCL the default platform because it would be a cross-platform and cross-architecture solution, however, on AMD it does not function adequately .
What are you doing to help the development of AMD-GPU on OpenCL with the Blender foundation?
With what driver release would you hope Catalyst will reach an acceptable level of functionality?
With what driver release would you hope Catalyst will reach broad parity with Nvidia/CUDA?
Is the upcoming AMD OpenCL APP SDK v1.2 a part of this strategy?
Above all; when will my 7970 rock at CYCLES?
Kind regards
palladium - Monday, May 14, 2012 - link
At the moment the CPU and GPU are relatively independant of each other in terms of operations, and both enjoy an (almost) equal area in terms of die space. Do you expect in the near future for AMD to head in a similar direction as the Cell processor (in PS3), where the CPU handles the OS and passing on most of the intensive calculations over to the GPU?SilthDraeth - Monday, May 14, 2012 - link
Just saying. These questions are for the AMD guy, and this benchpress guy comes in here spamming AVX2 to answer all the questions posed to AMD.Makes you go hmmm...
palladium - Monday, May 14, 2012 - link
yes, very suspicious indeed.BenchPress - Tuesday, May 15, 2012 - link
I just want what's best for all of us: homogeneous computing.Computing density is increasing quadratically, but bandwidth only increases linearly. Hence computing has to be done as locally as possible. It's inevitable that sooner or later the CPU and GPU will fully merge (they've been converging for many years). So HSA has no future, while AVX2 is exactly the merging of GPU technology into the CPU.
Gather used to be a GPU exclusive feature, giving it a massive benefit, but now it's part of AVX2.
_vor_ - Wednesday, May 16, 2012 - link
Give it a rest guy. People would like to hear from AMD.