How to run Hunyuan-Large (389B)? Llama.cpp doesn't support it

Posted by TackoTooTallFall@reddit | LocalLLaMA | View on Reddit | 15 comments

I have a homelab server that can run Llama 3.1 405B on CPU. I'm trying to run the new Hunyuan-Large 389B MoE model using Llama.cpp but I can't figure out how to do it.

If I try to use llama-cli with the FP8 Instruct safetensors files directly, I get "main: error: unable to load model". If I try to convert it to GGUF using convert_hf_to_gguf.py, I get "Model HunYuanForCausalLM is not supported".

How are others running this?

[-]

Intelligent_Jello344@reddit

https://github.com/Tencent/Tencent-Hunyuan-Large?tab=readme-ov-file#inference-framework
Their repository provides a customized version of vLLM for running it. However, you’ll need hundreds of GB of VRAM to run such a massive model.

[-]

ambient_temp_xeno@reddit

I wish someone would test it, somehow!

[-]

TackoTooTallFall@reddit (OP)

There's a Huggingface Gradio link now! https://huggingface.co/spaces/tencent/Hunyuan-Large

[-]

ambient_temp_xeno@reddit

Thanks! It seems a bit below average for my uses, so at least my bank account is safe.

[-]

AmericanNewt8@reddit

I'm afraid your only option this early, unless you want to do the legwork yourself, is going to be using straight transformers. It's not the worst out there but it's nowhere near as performant as vllm or llama.cpp, or the other custom engines.

[-]

TackoTooTallFall@reddit (OP)

Is there a guide on how to use this? I've never heard of it

[-]

arthurwolf@reddit

Typically, the github, huggingface, or website of the model, will have a simple python code example that shows how to run the model. Typically, that code will use the transformers library.

So the plan is, essentially, find instructions, follow instructions: Do whatever they tell you to do (often doing pip install some stuff, copy script, paste script into file, and do python script.py.

And then it should run. In theory.

[-]

TackoTooTallFall@reddit (OP)

You're a hero! Didn't even strike me that there was a Chinese GitHub repo with examples. Thanks

[-]

TheDreamWoken@reddit

Be the change you want to see

[-]

arousedsquirel@reddit

Use the Code sample, in the model card. copy to a large enough model and ask to verify if a gradio interface is provided, otherwise ask it to refactor the code to include it. Agents would be a nice have to automate test runs for working code. And you have your interface on your model running local.

[-]

TackoTooTallFall@reddit (OP)

Yes makes sense. Thank you for weighing in