Under 3 second time to first token, I literally don’t know what to add or do next for my local LLM. Can I get some input on ways to improve it?

[-]

LLM Hub is which this was forked from 4 months ago. Unfortunately it’s a completely different program at this point but that’s what I’ll refer people to. I can release the APk or app very soon if the community wants to help me make it better

Reply

[-]

RickyRickC137@reddit

I don't understand why you are getting down voted! But yeah, please release the apk!

Reply

[-]

juss-i@reddit

It's the pattern. Instead of posting a link to your vibe project, you post a video with something cool happening, with little explanation so there's no chance of figuring out from the post itself what it actually is, then wait for a commenter to ask what it is. Then you post a link to your repo/product/whatever. It's like an ad that only becomes an ad if someone shows interest.

Reply

[-]

Fear_ltself@reddit (OP)

Yeah except when the community asked me to open source my shit last time I did gladly in 4 hours (never had done a successful repo before) and got 273 stars with 0 issues. https://github.com/CyberMagician/Project_Golem

Reply

[-]

LegacyRemaster@reddit

good... Another star from me

Reply

[-]

hergendy@reddit

Why not open source it then and share it with audiences who would care to join.

Reply

[-]

Fear_ltself@reddit (OP)

I plan to very soon, not trying to gatekeep it, just trying to get it nearly bug free and the UI decent. It's 4 months since my first day coding kotlin android.

Reply

[-]

bladezor@reddit

What model are you using?

Reply

[-]

Fear_ltself@reddit (OP)

In this video Gemma 4 E4B it from mlx community, 4bit with MTP, kv cache 4bit balanced hosted from lm studio from my MacBook Pro m3 pro 18Gb on an eero mesh network. ComfyUI running on a 4070mobile (Asus Zephyrus G16) SDXL turbo 1 step for image generation. The prompt is just all the research from the last 3 years rolled up into one allowing Gemma 4 to use image generation enhancement techniques like a master prompt engineer and send them to the 4070, then back to the edge device on the same network. I also have aicore working, and Gemma 4 e4b it running on the pixel 9 pro itself with absolutereality, but the result go from 20 second to 2 minutes moving the inference completely to edge. This is still local on my network, I also have options to route to other APIs and have them use the tools. Nvidia free api for example lets me run Mistrial large 3. Or Gemini api for Gemini 3.5 Flash. But those are free options, not local, so I didn’t post them. Theres literally over a dozen options or you can bring your own litertlm model, or any lm studio hosted model.

Reply

[-]

overand@reddit

Switch to Flux 2 Klein (9b or 4b) for your image gen in comfyUI - you'll probably want to the"GGUF Loader" node for ComfyUI though.

Reply

[-]

Fear_ltself@reddit (OP)

Already an option whatever you want to use on comfy UI slide a slider in the settings (as long as it’s on your comfyUI). Pretty sure I can get it to work for video and music generation in an afternoon now that I think about it, thanks for mentioning this it sparked an idea

Reply

[-]

Tall-Ad-7742@reddit

under 2 seconds time to first token

Reply

[-]

juss-i@reddit

Game over, you win. But if you insist: * Set it up so family / friends / community can use it too * Make money selling your knowledge * Hire someone to use it for you, so you never ever again have to * Scale up and compete against the giants Oh wait, this isn't one of those "what's the name of that UI" baits, is it?

Reply

[-]

Fear_ltself@reddit (OP)

No I literally want it to be local llamas open source main hub lol. I’ll release the .APK if the community wants it. It’s 4 months of work forking LLM Hub but at this point every feature has been completely renovated. Still cite their work in the OSS contributions though

Reply

Under 3 second time to first token, I literally don’t know what to add or do next for my local LLM. Can I get some input on ways to improve it?

Reply to Post

15 Comments

RickyRickC137@reddit

Fear_ltself@reddit (OP)

RickyRickC137@reddit

juss-i@reddit

Fear_ltself@reddit (OP)

LegacyRemaster@reddit

hergendy@reddit

Fear_ltself@reddit (OP)

bladezor@reddit

Fear_ltself@reddit (OP)

overand@reddit

Fear_ltself@reddit (OP)

Tall-Ad-7742@reddit

juss-i@reddit

Fear_ltself@reddit (OP)