How to run Hunyuan-Large (389B)? Llama.cpp doesn't support it

Posted by TackoTooTallFall@reddit | LocalLLaMA | View on Reddit | 11 comments

I have a homelab server that can run Llama 3.1 405B on CPU. I'm trying to run the new Hunyuan-Large 389B MoE model using Llama.cpp but I can't figure out how to do it.

If I try to use llama-cli with the FP8 Instruct safetensors files directly, I get "main: error: unable to load model". If I try to convert it to GGUF using convert_hf_to_gguf.py, I get "Model HunYuanForCausalLM is not supported".

How are others running this?