Open Web UI, Ollama (rocm) never ending loop

Posted by supracode@reddit | LocalLLaMA | View on Reddit | 6 comments

I am pretty new to this setup. I just finished setting up a new R9700 on my Ubuntu server. I imported the 8bit Gemma 4 that I had downloaded for testing in lm studio. I included 4 small config files in the context, and after a few prompts, got 100% gpu usage in a never ending loop :

Is this related to context size, thinking, or something else?

[-]

supracode@reddit (OP)

Ok, building llama.cpp for vulkan now. Thanks all!

[-]

supracode@reddit (OP)

prompt : 2100 t/s | 65 t/s using ollama

[ Prompt: 1685.6 t/s | Generation: 84.9 t/s ] using llama.cpp

nice improvement... thanks again for the advice!

[-]

dreaddymck@reddit

You could try lowing the temperature, presence-penalty and couple other things, Most of what you would end up adding to the llama.cpp startup script.

That said, I switched to llama.cpp

[-]

deepspace86@reddit

I made this jump recently. Look at llama-swap. It still isn't quite as convenient for downloading models but at least you can specify models directly from hugging face and you can switch between models on the fly.

[-]

jacek2023@reddit

Uninstall ollama. Install llama.cpp. Be a happy person.

[-]

CalligrapherFar7833@reddit

Use llamacpp