Tutorial: How to Run DeepSeek-R1 (671B) 1.58bit on Open WebUI

Posted by yoracale@reddit | LocalLLaMA | View on Reddit | 41 comments

Hey guys! Daniel & I (Mike) at Unsloth collabed with Tim from Open WebUI to bring you this step-by-step on how to run the non-distilled DeepSeek-R1 Dynamic 1.58-bit model locally!

This guide is summarized so I highly recommend you read the full guide (with pics) here: https://docs.openwebui.com/tutorials/integrations/deepseekr1-dynamic/

To Run DeepSeek-R1:

1. Install Llama.cpp

2. Download the Model (1.58-bit, 131GB) from Unsloth

from huggingface_hub import snapshot_download snapshot_download(     repo_id="unsloth/DeepSeek-R1-GGUF",     local_dir="DeepSeek-R1-GGUF",     allow_patterns=["*UD-IQ1_S*"] )

DeepSeek-R1-GGUF/ ├── DeepSeek-R1-UD-IQ1_S/ │   ├── DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf │   ├── DeepSeek-R1-UD-IQ1_S-00002-of-00003.gguf │   ├── DeepSeek-R1-UD-IQ1_S-00003-of-00003.gguf

3. Install and Run Open WebUI

4. Start the Model Server with Llama.cpp

Now that the model is downloaded, the next step is to run it using Llama.cpp’s server mode.

🛠️Before You Begin:

  1. Locate the llama-server Binary
  2. If you built Llama.cpp from source, the llama-server executable is located in:llama.cpp/build/bin Navigate to this directory using:cd [path-to-llama-cpp]/llama.cpp/build/bin Replace [path-to-llama-cpp] with your actual Llama.cpp directory. For example:cd \~/Documents/workspace/llama.cpp/build/bin
  3. Point to Your Model Folder
  4. Use the full path to the downloaded GGUF files.When starting the server, specify the first part of the split GGUF files (e.g., DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf).

🚀Start the Server

Run the following command:

./llama-server \     --model /[your-directory]/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \     --port 10000 \     --ctx-size 1024 \     --n-gpu-layers 40

Example (If Your Model is in /Users/tim/Documents/workspace):

./llama-server \     --model /Users/tim/Documents/workspace/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \     --port 10000 \     --ctx-size 1024 \     --n-gpu-layers 40

✅ Once running, the server will be available at:

http://127.0.0.1:10000

🖥️ Llama.cpp Server Running

[After running the command, you should see a message confirming the server is active and listening on port 10000.](

Step 5: Connect Llama.cpp to Open WebUI

  1. Open Admin Settings in Open WebUI.
  2. Go to Connections > OpenAI Connections.
  3. Add the following details:
  4. URL → http://127.0.0.1:10000/v1API Key → none

Adding Connection in Open WebUI

Notes

If you have any questions please let us know and also - any suggestions are also welcome! Happy running folks! :)