Local LLaMA-3-8B on an Intel GPU and GPT-4o/mini: a speed comparison.

[-]

trololololo2137@reddit

4o mini is better than any 8B model

Reply

[-]

carnyzzle@reddit

That's not the point, OP is showing how fast it is compared to 4o

Reply

[-]

trololololo2137@reddit

But it's a meaningless comparison. GPT-2 124M runs very fast on my phone compared to GPT-4 on 8xH100 in the cloud but who cares?

Reply

[-]

LinkStreet1167@reddit

Which Intel GPU?

Reply

[-]

qnixsynapse@reddit (OP)

Intel Arc GPU; w/o flash attention

Reply

[-]

sampdoria_supporter@reddit

There are multiple models of the ARC GPU - which one is this?

Reply

[-]

ab2377@reddit

also which exact quant?

Reply

[-]

qnixsynapse@reddit (OP)

Q4_K quant

Reply

[-]

ab2377@reddit

alright the speed is ok for q4. How much is the vram on this, whats the model/pc?

Reply

[-]

qnixsynapse@reddit (OP)

8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024.

Reply

[-]

eposnix@reddit

I'm guessing this is free ChatGPT. Paid gpt-4o-mini is so fast the screen can't keep up.

Reply

[-]

nananashi3@reddit

What T/s though?

Reply

[-]

Spare-Abrocoma-4487@reddit

Which answer is correct though

Reply

Local LLaMA-3-8B on an Intel GPU and GPT-4o/mini: a speed comparison.

Reply to Post

13 Comments

trololololo2137@reddit

carnyzzle@reddit

trololololo2137@reddit

LinkStreet1167@reddit

qnixsynapse@reddit (OP)

sampdoria_supporter@reddit

ab2377@reddit

qnixsynapse@reddit (OP)

ab2377@reddit

qnixsynapse@reddit (OP)

eposnix@reddit

nananashi3@reddit

Spare-Abrocoma-4487@reddit