Unsloth just released their GGUF of Kimi-K2-Instruct-0905!
Posted by TheAndyGeorge@reddit | LocalLLaMA | View on Reddit | 41 comments
Posted by TheAndyGeorge@reddit | LocalLLaMA | View on Reddit | 41 comments
TheAndyGeorge@reddit (OP)
....time to go find some more RAM.
townofsalemfangay@reddit
And here I was hoping to run Q4 comfortably like I did DS V3.1.. god damn 1.09 TB that's a chonky boy for sure.
No_Afternoon_4260@reddit
What hardware/speeds for DS?
Dimi1706@reddit
For now there is no Q4... Let's wait a little, maybe they add more quants
townofsalemfangay@reddit
Oh definitely. I'm waiting on that and Qwen 3 Max (it's actually insane level good at conversation).
DragonfruitIll660@reddit
Gonna run it from NVME, should expect first response in a few days!
TheAndyGeorge@reddit (OP)
https://i.imgur.com/LACqJAR.jpeg
Marksta@reddit
Really excited for this one, the previous K2 is a beast π€© /u/VoidAlchemy, we're once again asking for your support π
VoidAlchemy@reddit
and so it begins... again... haha xD
Thireus@reddit
π
emsiem22@reddit
Almost...almost it would fit in /s
createthiscom@reddit
Waiting for `Q4_K_XL`, personally.
cms2307@reddit
Only 500 gigabytes of ram needed for that one π
createthiscom@reddit
I've got 768gb, so yeah.
cms2307@reddit
Ddr4 or ddr5? What type of speeds are you getting? I wish I could run this one.
createthiscom@reddit
ddr5 5600 MT/s, 24 channels. It also has a blackwell 6000 pro. You can see the previous Kimi-k2 model running here: https://youtu.be/eCapGtOHG6I?si=fXWLU4Dv0dHxXzS0&t=1704
PC Build and CPU-only inference: https://youtu.be/v4810MVGhog
cms2307@reddit
Impressive!
TheAndyGeorge@reddit (OP)
and here i am buying a 2x32GB ddr4 kit to double my system RAMπ
jacek2023@reddit
How do you run it locally? I can't and I have 3x3090
TheAndyGeorge@reddit (OP)
have you considered 30x3090s? That'd get you partway there!
jacek2023@reddit
So how do you use this file?
TheAndyGeorge@reddit (OP)
If you're looking for a serious answer, then yeah you'd need a lot of beefy hardware. 1TB of system RAM is certainly doable, if expensive. I think the point is that it is still possible to run this model locally.
jacek2023@reddit
That wasn't my question :)
TheAndyGeorge@reddit (OP)
You're asking how to use a GGUF? I use Ollama, so it's as simple as
ollama run hf.co/unsloth/Kimi-K2-Instruct-0905-GGUF:Q8_0
. I assume llama.cpp and vLLM or anything that can consume GGUF should be able to handle this as well.jacek2023@reddit
Can you show your t/s?
TheAndyGeorge@reddit (OP)
I do not have 1TB of RAM or VRAM.
jacek2023@reddit
So how do you use this file???
cms2307@reddit
Fuck off do your own research
Fine-Will@reddit
If you're looking for a serious answer, then yeah you'd need a lot of beefy hardware. 1TB of system RAM is certainly doable, if expensive. I think the point is that it is still possible to run this model locally.
Vatnik_Annihilator@reddit
Were you dropped on your head as a child? They've answered your question multiple times.
Unless you have an extremely expensive professional setup, you will not be able to run this model. It is not meant for people to run on their home computers. There are much smaller models for that purpose.
TheAndyGeorge@reddit (OP)
Have 1TB of RAM or VRAM.
townofsalemfangay@reddit
I laughed harder at this than I should have lol
CheatCodesOfLife@reddit
Yeah, we've got to wait for ubergarm to do ikllama quants to do cpu+gpu
relmny@reddit
offloading layers to cpu. Look at Unsloth's website (there should be a link in their huggingface link)
Sol_Ido@reddit
Incredible! Tell your kids
TheAndyGeorge@reddit (OP)
he's a toddler, so he laughed maniacally and then threw my wireless mouse down a flight of stairs
relmny@reddit
Not thereΒ yet, just a "placeholder" I've been refreshing that page for about 3 hours...
TheAndyGeorge@reddit (OP)
https://huggingface.co/unsloth/Kimi-K2-Instruct-0905-GGUF/tree/main/Q8_0 The sharded files were uploaded less than an hour ago, but they are there.
relmny@reddit
you're right!
must've been the cache...
I still need to wait for Q2... so I need to be patient.
TheAndyGeorge@reddit (OP)
I'm gonna need a Q0.0005 release for my crappy hardware πππ
AppealThink1733@reddit
When will there be a Kimi k2 quantized to 7B? ππ