Need to know more about less known engines (ik_llama.cpp, exllamav3..)

Posted by Leflakk@reddit | LocalLLaMA | View on Reddit | 28 comments

I usually stick to llama.cpp and vllm but llama.cpp speed may not be the best and vllm/sglang can be really annoying if you have several gpus without respecting the power of 2 for tp.

So, for people who really know others projects (I mainly know ik_llama and exl3) could you please provide some feedback on where they really shine and what are their main constraints (model/hardware support, tool calling, stability…).

Testing / understanding stuff may take some time so any usefull info is good to have, thanks!