TheaterFire

Local LLaMA-3-8B on an Intel GPU and GPT-4o/mini: a speed comparison.

Posted by qnixsynapse@reddit | LocalLLaMA | View on Reddit | 13 comments

Reply to Post

13 Comments

trololololo2137@reddit

4o mini is better than any 8B model
View on Reddit #31440983

carnyzzle@reddit

That's not the point, OP is showing how fast it is compared to 4o
View on Reddit #31441120

trololololo2137@reddit

But it's a meaningless comparison. GPT-2 124M runs very fast on my phone compared to GPT-4 on 8xH100 in the cloud but who cares?
View on Reddit #31441322

LinkStreet1167@reddit

Which Intel GPU?
View on Reddit #31435430

qnixsynapse@reddit (OP)

Intel Arc GPU; w/o flash attention
View on Reddit #31436537

sampdoria_supporter@reddit

There are multiple models of the ARC GPU - which one is this?
View on Reddit #31439210

ab2377@reddit

also which exact quant?
View on Reddit #31435506

qnixsynapse@reddit (OP)

Q4_K quant
View on Reddit #31436542

ab2377@reddit

alright the speed is ok for q4. How much is the vram on this, whats the model/pc?
View on Reddit #31437053

qnixsynapse@reddit (OP)

8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024.
View on Reddit #31437589

eposnix@reddit

I'm guessing this is free ChatGPT. Paid gpt-4o-mini is so fast the screen can't keep up.
View on Reddit #31437356

nananashi3@reddit

What T/s though?
View on Reddit #31437039

Spare-Abrocoma-4487@reddit

Which answer is correct though
View on Reddit #31435328