backend-agnostic tensor parallelism has been merged into llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 56 comments

if you have more than one GPU - your models can now run much faster