An interesting challenge to squish out as many juice from Qwen2.5 0.5B model
Posted by ANR2ME@reddit | LocalLLaMA | View on Reddit | 7 comments
https://www.h2loop.ai/contests/bear-the-tokens
Someone was able to get more than 5k tok/s on a T4 GPU 😯
FusionCow@reddit
Optimizing is always cool, but on a model so useless, you gotta wonder why
BigYoSpeck@reddit
Reminder that race walking is an Olympic sport
jmprog@reddit
Oh cool, how do we see the code that got it to that speed?
ANR2ME@reddit (OP)
i don't think you can see it during the contest, otherwise all new contestants will just copy the no.1 codes 😂
but hopefully they release the code at the end of the contest 🤔
Inevitable-Log5414@reddit
Someday we'll be able to run LLM on a router)Â
julp@reddit
Doesn't get more "edge" than that
ANR2ME@reddit (OP)
Considering that an enterprise router can have 16GB RAM with 16-core CPU, i guess it could happen 🤔 https://mikrotik.com/product/ccr2216_1g_12xs_2xq