How to run Whisper Large-v3 on 4gb vram (in my case, 1050 Ti)

Posted by Normal-Ad-7114@reddit | LocalLLaMA | View on Reddit | 4 comments

I decided to start this thread because I figured someone might search for "whisper large vram requirements" and be disappointed to learn that it needs 6gb+. On 4gb vram, the vanilla Whisper can't even load the 'medium' model, let alone the 'large' one. After spending a good deal of time searching for a solution, I stumbled upon **whisper.cpp** by ggerganov - the genius behind ggml and numerous other amazing projects. To run the large-v3 Whisper model on a 1050 Ti 4gb, you will need to: 1. Install CUDA 2. For Windows, download the latest release from [https://github.com/ggerganov/whisper.cpp/releases](https://github.com/ggerganov/whisper.cpp/releases) (look for whisper-cublas-x.x.x-bin-x64.zip) 3. For Linux installation, the quick start guide is at [https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#quick-start](https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#quick-start) 4. Next, download the model by running "**models\\download-ggml-model.cmd large-v3**" if you're on Windows, or "**./models/download-ggml-model.sh large-v3**" for Linux users 5. Then, you'll need to quantize the model. Execute "**quantize models/ggml-large-v3.bin models/ggml-large-v3-q8\_0.bin q8\_0**" in the command line (or "**./quantize ...**" for Linux) That's it! To transcribe a .wav file, simply run "**main -m models\\ggml-large-v3-q8\_0.bin -f file.wav**" Whisper.cpp can also do many other cool things. I highly recommend checking out the [documentation](https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#whispercpp).