Tried running UI-TARS 7B on Colab free T4 — OOM'd

Posted by Long_Respond1735@reddit | LocalLLaMA | View on Reddit | 1 comments

Spent 30 minutes today trying to serve UI-TARS 1.5 7B via vLLM on Colab's free T4. OOM. The model weights alone are 14.2GB in FP16, and vLLM adds \~2GB overhead — T4 only has 15.6GB.

Switched to Ollama with a Q4 quant on Kaggle's free T4x2 and it worked fine. But I only figured this out after trial and error.

I know there are web-based VRAM calculators (apxml, gpuforllm, etc) but they don't account for:

- Runtime overhead (vLLM vs Ollama vs llama.cpp — big difference)

- Vision model encoder overhead (VLMs need extra VRAM for the vision encoder on top of the language model)

- Auto-detecting your actual GPU

Is there a CLI tool that does something like:

check ui-tars-7b --gpu t4 --runtime vllm

→ ❌ won't fit (17.1GB needed, 15.6GB available)

→ try Q4 via Ollama instead (4.5GB)

Or does everyone just trial-and-error it?