An interesting challenge to squish out as many juice from Qwen2.5 0.5B model

https://www.h2loop.ai/contests/bear-the-tokens

Someone was able to get more than 5k tok/s on a T4 GPU 😯

[-]

Optimizing is always cool, but on a model so useless, you gotta wonder why

[-]

Reminder that race walking is an Olympic sport

[-]

Oh cool, how do we see the code that got it to that speed?

[-]

i don't think you can see it during the contest, otherwise all new contestants will just copy the no.1 codes 😂

but hopefully they release the code at the end of the contest 🤔

[-]

Someday we'll be able to run LLM on a router)

[-]

Doesn't get more "edge" than that

[-]

Considering that an enterprise router can have 16GB RAM with 16-core CPU, i guess it could happen 🤔 https://mikrotik.com/product/ccr2216_1g_12xs_2xq