SambaNova and Intel Announce Blueprint for Heterogeneous Inference: GPUs For Prefill, SambaNova RDUs for Decode, and Intel® Xeon® 6 CPUs for Agentic Tools

Posted by Primary_Olive_5444@reddit | hardware | View on Reddit | 4 comments

https://sambanova.ai/press/sambanova-announces-collaboration-with-intel-on-ai-solution

Sambanova announcement:

In this new design:

https://hc2024.hotchips.org/assets/program/conference/day1/48_HC2024.Sambanova.Prabhakar.final-withoutvideo.pdf

[](

It seems like a RDU, is for faster data (load and unload) movements (relative to GPU hardware data movement performance) during inference.

For a given inference task, you load all the relevant expert models related to that task/prompt into DDR memory first and then fast-swapping it out during the different phases until completion of that task.

Phase 1: I use model A that is best in this part of the workload

Phase 2: then load model B (which is good for another part of the work) and move out A (maybe start preparing C loading meantime?)

Phase 3: model C (move out B and load C)

Is this how it works roughly?