Stop Wasting Your Multi-GPU Setup With llama.cpp: Use vLLM or ExLlamaV2 for Tensor Parallelism

Posted by XMasterrrr@reddit | LocalLLaMA | View on Reddit | 112 comments