TheaterFire

Under 3 second time to first token, I literally don’t know what to add or do next for my local LLM. Can I get some input on ways to improve it?

Posted by Fear_ltself@reddit | LocalLLaMA | View on Reddit | 15 comments

Reply to Post

15 Comments

RickyRickC137@reddit

Is it available on Android?
View on Reddit #87171277

Fear_ltself@reddit (OP)

LLM Hub is which this was forked from 4 months ago. Unfortunately it’s a completely different program at this point but that’s what I’ll refer people to. I can release the APk or app very soon if the community wants to help me make it better
View on Reddit #87171684

RickyRickC137@reddit

I don't understand why you are getting down voted! But yeah, please release the apk!
View on Reddit #87174788

juss-i@reddit

It's the pattern. Instead of posting a link to your vibe project, you post a video with something cool happening, with little explanation so there's no chance of figuring out from the post itself what it actually is, then wait for a commenter to ask what it is. Then you post a link to your repo/product/whatever. It's like an ad that only becomes an ad if someone shows interest.
View on Reddit #87182643

Fear_ltself@reddit (OP)

Yeah except when the community asked me to open source my shit last time I did gladly in 4 hours (never had done a successful repo before) and got 273 stars with 0 issues. https://github.com/CyberMagician/Project_Golem
View on Reddit #87200719

LegacyRemaster@reddit

good... Another star from me
View on Reddit #87241485

hergendy@reddit

Why not open source it then and share it with audiences who would care to join.
View on Reddit #87172969

Fear_ltself@reddit (OP)

I plan to very soon, not trying to gatekeep it, just trying to get it nearly bug free and the UI decent. It's 4 months since my first day coding kotlin android.
View on Reddit #87173217

bladezor@reddit

What model are you using?
View on Reddit #87171871

Fear_ltself@reddit (OP)

In this video Gemma 4 E4B it from mlx community, 4bit with MTP, kv cache 4bit balanced hosted from lm studio from my MacBook Pro m3 pro 18Gb on an eero mesh network. ComfyUI running on a 4070mobile (Asus Zephyrus G16) SDXL turbo 1 step for image generation. The prompt is just all the research from the last 3 years rolled up into one allowing Gemma 4 to use image generation enhancement techniques like a master prompt engineer and send them to the 4070, then back to the edge device on the same network. I also have aicore working, and Gemma 4 e4b it running on the pixel 9 pro itself with absolutereality, but the result go from 20 second to 2 minutes moving the inference completely to edge. This is still local on my network, I also have options to route to other APIs and have them use the tools. Nvidia free api for example lets me run Mistrial large 3. Or Gemini api for Gemini 3.5 Flash. But those are free options, not local, so I didn’t post them. Theres literally over a dozen options or you can bring your own litertlm model, or any lm studio hosted model.
View on Reddit #87172408

overand@reddit

Switch to Flux 2 Klein (9b or 4b) for your image gen in comfyUI - you'll probably want to the"GGUF Loader" node for ComfyUI though.
View on Reddit #87210893

Fear_ltself@reddit (OP)

Already an option whatever you want to use on comfy UI slide a slider in the settings (as long as it’s on your comfyUI). Pretty sure I can get it to work for video and music generation in an afternoon now that I think about it, thanks for mentioning this it sparked an idea
View on Reddit #87214142

Tall-Ad-7742@reddit

under 2 seconds time to first token
View on Reddit #87188992

juss-i@reddit

Game over, you win. But if you insist: * Set it up so family / friends / community can use it too * Make money selling your knowledge * Hire someone to use it for you, so you never ever again have to * Scale up and compete against the giants Oh wait, this isn't one of those "what's the name of that UI" baits, is it?
View on Reddit #87171580

Fear_ltself@reddit (OP)

No I literally want it to be local llamas open source main hub lol. I’ll release the .APK if the community wants it. It’s 4 months of work forking LLM Hub but at this point every feature has been completely renovated. Still cite their work in the OSS contributions though
View on Reddit #87171638