is possible to run a quantized llama 70B model on CPU / iGPU / APU ?

Posted by grigio@reddit | hardware | View on Reddit | 7 comments

I can run it at 2 tokens/s but I'd like to run it at least at 10 tokens/s I don't want a GPU