Running on cpu :(

Posted by Frizzy-MacDrizzle@reddit | LocalLLaMA | View on Reddit | 4 comments

I am in the midst of a POC project at work and am I have is 4 AMD Epyc cores and those are essentially virtualized. Does any one have any tricks? Additionally kv cache sucks on system memory and have to clear it by adding ALL the no cache and sps 1 etc,. I have 32gb memory, loads the model fine, mistral 7b q4 k m.

To add, this is part of a RAG system and the context will get piped into the system prompt. I was on Ollama but have since moved to llama-server.

Please suggest and I will say of i tried, or will do. Close but yet not quality. Example, it’s not adding 8 records json with 4 columns name, company, balance, phone. The balance is always off and there is not a correlation to missing a balance.

I can’t really say exactly what I have tried, and not for solutions as it is probably working as much as it can, just tips, tricks, please.