Open Web UI, Ollama (rocm) never ending loop
Posted by supracode@reddit | LocalLLaMA | View on Reddit | 6 comments
I am pretty new to this setup. I just finished setting up a new R9700 on my Ubuntu server. I imported the 8bit Gemma 4 that I had downloaded for testing in lm studio. I included 4 small config files in the context, and after a few prompts, got 100% gpu usage in a never ending loop :

Is this related to context size, thinking, or something else?
supracode@reddit (OP)
Ok, building llama.cpp for vulkan now. Thanks all!
supracode@reddit (OP)
prompt : 2100 t/s | 65 t/s using ollama
vs
[ Prompt: 1685.6 t/s | Generation: 84.9 t/s ] using llama.cpp
nice improvement... thanks again for the advice!
dreaddymck@reddit
You could try lowing the temperature, presence-penalty and couple other things, Most of what you would end up adding to the llama.cpp startup script.
That said, I switched to llama.cpp
deepspace86@reddit
I made this jump recently. Look at llama-swap. It still isn't quite as convenient for downloading models but at least you can specify models directly from hugging face and you can switch between models on the fly.
jacek2023@reddit
Uninstall ollama. Install llama.cpp. Be a happy person.
CalligrapherFar7833@reddit
Use llamacpp