Local LLaMA-3-8B on an Intel GPU and GPT-4o/mini: a speed comparison. Posted by qnixsynapse@reddit | LocalLLaMA | View on Reddit | 13 comments
[-] trololololo2137@reddit 4o mini is better than any 8B model Reply Submit [-] carnyzzle@reddit That's not the point, OP is showing how fast it is compared to 4o Reply Submit [-] trololololo2137@reddit But it's a meaningless comparison. GPT-2 124M runs very fast on my phone compared to GPT-4 on 8xH100 in the cloud but who cares? Reply Submit
[-] carnyzzle@reddit That's not the point, OP is showing how fast it is compared to 4o Reply Submit [-] trololololo2137@reddit But it's a meaningless comparison. GPT-2 124M runs very fast on my phone compared to GPT-4 on 8xH100 in the cloud but who cares? Reply Submit
[-] trololololo2137@reddit But it's a meaningless comparison. GPT-2 124M runs very fast on my phone compared to GPT-4 on 8xH100 in the cloud but who cares? Reply Submit
[-] LinkStreet1167@reddit Which Intel GPU? Reply Submit [-] qnixsynapse@reddit (OP) Intel Arc GPU; w/o flash attention Reply Submit [-] sampdoria_supporter@reddit There are multiple models of the ARC GPU - which one is this? Reply Submit [-] ab2377@reddit also which exact quant? Reply Submit [-] qnixsynapse@reddit (OP) Q4_K quant Reply Submit [-] ab2377@reddit alright the speed is ok for q4. How much is the vram on this, whats the model/pc? Reply Submit [-] qnixsynapse@reddit (OP) 8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024. Reply Submit
[-] qnixsynapse@reddit (OP) Intel Arc GPU; w/o flash attention Reply Submit [-] sampdoria_supporter@reddit There are multiple models of the ARC GPU - which one is this? Reply Submit
[-] sampdoria_supporter@reddit There are multiple models of the ARC GPU - which one is this? Reply Submit
[-] ab2377@reddit also which exact quant? Reply Submit [-] qnixsynapse@reddit (OP) Q4_K quant Reply Submit [-] ab2377@reddit alright the speed is ok for q4. How much is the vram on this, whats the model/pc? Reply Submit [-] qnixsynapse@reddit (OP) 8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024. Reply Submit
[-] qnixsynapse@reddit (OP) Q4_K quant Reply Submit [-] ab2377@reddit alright the speed is ok for q4. How much is the vram on this, whats the model/pc? Reply Submit [-] qnixsynapse@reddit (OP) 8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024. Reply Submit
[-] ab2377@reddit alright the speed is ok for q4. How much is the vram on this, whats the model/pc? Reply Submit [-] qnixsynapse@reddit (OP) 8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024. Reply Submit
[-] qnixsynapse@reddit (OP) 8GB VRAM(A750) ... This is without flash attention. I bought this GPU because I wanted to try it out(apart from AMD and Nvidia), never thought I would be using it to run LLMs in 2024. Reply Submit
[-] eposnix@reddit I'm guessing this is free ChatGPT. Paid gpt-4o-mini is so fast the screen can't keep up. Reply Submit
13 Comments
trololololo2137@reddit
carnyzzle@reddit
trololololo2137@reddit
LinkStreet1167@reddit
qnixsynapse@reddit (OP)
sampdoria_supporter@reddit
ab2377@reddit
qnixsynapse@reddit (OP)
ab2377@reddit
qnixsynapse@reddit (OP)
eposnix@reddit
nananashi3@reddit
Spare-Abrocoma-4487@reddit