[-]

tilmx@reddit

Hey u/vaibhavs10 - great feature! Small piece of feedback: I'm sure you know, but many of the popular models will have more GGUF variants than can be displayed on the sidebar:

Clicking on the "+2 variants" takes you to the "files and versions" tab, which no longer includes compatibility info (unless I'm missing something?) Do you have any plans to add it there? Alternatively, you could have the Hardware compatibility section expand in place.

[-]

vaibhavs10@reddit (OP)

Hey hey, I'm VB, GPU Poor @ Hugging Face - starting today you can find directly from the Hugging Face page of a GGUF. All you need is to update the specifications of your hardware here: https://huggingface.co/settings/local-apps and then on any GGUF across Quant types it should tell you if you can run or not.

Take it out for a spin and let us know what you think!

[-]

10minOfNamingMyAcc@reddit

It doesn't account for multiple GPUs

[-]

10minOfNamingMyAcc@reddit

Text Generation

Browse compatible models

llama.cpp
LM Studio
Jan
Backyard AI
Jellybox
RecurseChat
Msty
Sanctum
LocalAI
vLLM
node-llama-cpp
Ollama
TGI

So for Koboldcpp, should I select llama.cpp?

[-]

Frankie_T9000@reddit

Going to look tonight but quick question does it support multiple PC's or need to change config on fly?

[-]

vaibhavs10@reddit (OP)

yes, it should be

[-]

Frankie_T9000@reddit

Thanks, had a look and to confirm its hardware instruction based for cpu not memory based?

[-]

Liringlass@reddit

Can it instead give my machine the VRAM it desperately needs? :D

[-]

DegenerativePoop@reddit

Any plans on updating to include the newest GPUs such as the 9070/9070xt?

[-]

MegaBytesMe@reddit

Very cool! Missing my Nvidia Quadro RTX 3000 though (in my Surface Book 3) and ARM based processors (Snapdragon X Elite etc)

[-]

vaibhavs10@reddit (OP)

aha! do you mind opening a PR here: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/hardware.ts and tagging me on the PR `Vaibhavs10` 🙏

[-]

MegaBytesMe@reddit

Sure thing!

[-]

abitrolly@reddit

I feel like pasting `fastfetch` would be the best UI. I am not sure which i5 generation is this.

CPU: Intel(R) Core(TM) i5-4300U (4) @ 2.90 GHz

[-]

LagOps91@reddit

if i might suggest something? would it be possible to use this information during model search? for instance, i would want to look for models, where i can run Q4 at minimum and i'm not interested if i can run Q8 or larger (since they typically get outperformed by larger models with lower quants)

[-]

vaibhavs10@reddit (OP)

good feedback - will iterate on this with the team

[-]

Devonance@reddit

Love this. Making the starting bar lower for hobbyists.

Any chance of getting AWQ to be added?

Also, a "default" option for the graphics card (or CPU) that shows up first when calculating it? It pulls my RTX A4500 before my 2x4090s, so I have to adjust it everytime. (I did just rearrange in the hardware settings and it's whichever is first added).

Also, maybe in the far future, adding in the PCIe slot data lane number and then giving an estimate of token/sec (complete estimate as other things would affect this).

[-]

vaibhavs10@reddit (OP)

We'll increase the coverage of models supported, yes, but a bit slowly with wherever the community wants us to go next - keep the feedback coming!

[-]

strategos@reddit

Key question though - Are you really GPU poor? :)

[-]

vaibhavs10@reddit (OP)

haha, you can look at my GPU stack here: https://huggingface.co/reach-vb

[-]

Euphoric-Bullfrog525@reddit

Hi, I'm a beginner starting out. I made an account on hugging face and added my hardware specs, but I'm not seeing the GGUF interface on the model page here:

[-]

_harias_@reddit

There is a separate page for the GGUF versions: https://huggingface.co/Qwen/QwQ-32B-GGUF

[-]

Euphoric-Bullfrog525@reddit

Thanks!

[-]

noneabove1182@reddit

such an awesome QoL upgrade.. awesome at-a-glance info, even if it doesn't give the full story it'll make life so much easier for a lot of people!

[-]

AstroEmanuele@reddit

Idk if it's a bug, but somehow a ryzen 9 7000 series with 16gb of ram is only 0.56 TFLOPS, while my ryzen 7 5000 series with the same amount of ram is almost three times higher at 1.33 TFLOPS, is that actually correct?

[-]

AstroEmanuele@reddit

u/vaibhavs10

[-]

xqoe@reddit

I need a way to calculate BPS and FLOPS per token per second for a choosen model, so I can compare if my hardware can run it

[-]

carvengar@reddit

What does it mean you can 'run' the llm?

Cause 1 token a second isn't usable, but its still 'runs'.

[-]

Delicious_One_7887@reddit

I have a total of 2.60 TFLOPS of computing power according to this

[-]

ParaboloidalCrest@reddit

Damn! That's a shot at LMStudio!

[-]

vaibhavs10@reddit (OP)

eh! not really, we love lmstudio and almost chat with them everyday, infact you can open a GGUF directly from the model page in lmstudio as well!

[-]

ParaboloidalCrest@reddit

I was kidding. Thanks for the feature.

[-]

sunpazed@reddit

This is really great, well done to you and the team!

[-]

AlphaPrime90@reddit

CPU and GPU button does not work in Firefox. Will try again tomorrow.

[-]

drink_with_me_to_day@reddit

There's a bug, my 750 ti isn't in the dropdown

[-]

vaibhavs10@reddit (OP)

we should fix this - do you mind opening a PR here: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/hardware.ts and tagging me on the PR `Vaibhavs10` 🙏

[-]

drink_with_me_to_day@reddit

I was joking as I think the 750ti is too old to run anything

[-]

MagicaItux@reddit

I think you have bigger issues

[-]

LA_rent_Aficionado@reddit

Would love for this to be able to tell you the max context you could run

[-]

vaibhavs10@reddit (OP)

good feedback

[-]

LA_rent_Aficionado@reddit

Happy to help! And maybe even an estimated gpu layers offload… I feel like lmstudio and kobaldcpp really drop the ball with calculating these for multi gpu setups.

Keep up the good work!

[-]

das_rdsm@reddit

Maybe add MLX as well? GGUF is not great for Apple Silicon.

[-]

vaibhavs10@reddit (OP)

on the list, yes!

[-]

DerfK@reddit

Pretty neat. If it expanded beyond GGUF quants it could become really useful guidance eg if I have multiple video cards, recommending quants and software that can implement tensor parallelism to use the hardware to the fullest. Of course then someone would have to keep track of all the feature compatibilities but at least it wouldn't be me :D

[-]

vaibhavs10@reddit (OP)

yeah! definitely want to increase to more model types soon

[-]

fuutott@reddit

Please add rtx pro 6000

[-]

vaibhavs10@reddit (OP)

Sure, do you mind opening a PR here: https://github.com/huggingface/huggingface.js/blob/1aa1c3f4d2081b270517219c49c95c1d8d7fc682/packages/tasks/src/hardware.ts and tagging me on the PR `Vaibhavs10` 🙏

[-]

panchovix@reddit

This is great! But for multigpu, it doesn't seem to sum up? Like I have multiple different GPUs but let me choose 1 of them to evaluate.

[-]

vaibhavs10@reddit (OP)

looking into it with the team

[-]

puncia@reddit

A very good addition to this would be a suggested number of gpu layers to offload when using cpu + gpu inference, as I'm sure many of us do

[-]

vaibhavs10@reddit (OP)

yes! this is a first edition - will iterate a bit in the future :D

[-]

draetheus@reddit

Interesting idea, although I'd say this is highly variable depending how much context you run with. I usually keep my KV cache in RAM as well (rather than VRAM) so I can maximize the quant quality within 12GB VRAM.

For instance I can run Mistral Small 3.1 at IQ4_XS in 12GB VRAM, although this tool is saying Q3_K_S is the limit.

[-]

It's pretty straight forward, but there isn't much information floating around. I got confuse when I searched for something like "llama 3.1 8b vram requirement".

[-]