PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.
Posted by xenovatech@reddit | LocalLLaMA | View on Reddit | 75 comments
The PrismML team really cooked with these models. They're only \~3GB in size (compared to FLUX.2 Klein 4B, which is \~16GB). Apache-2.0!
Official collection on HF: https://huggingface.co/collections/prism-ml/bonsai-image
Link to demo: https://huggingface.co/spaces/webml-community/bonsai-image-webgpu
dh7net@reddit
I generated 192 image with Ternary Bonsai on my GX10 spark.
You can see them here: https://imagebench.ai/gallery?v=hhhhhhhhhhhs.ssssss
AccountAntique9327@reddit
im assuming it uses balanced terniaries?
Fun_Librarian_7699@reddit
My first thought was that you could use this model to make those cool pixel-block bonsai trees. Now I'm actually pretty disappointed with the model
Zulfiqaar@reddit
I thought that was really cool too! So I recreated it, Preview here and Repo Here
Yes-Scale-9723@reddit
That's so cool
Flamenverfer@reddit
This is pretty sick, Nice!
Clear-Ad-9312@reddit
One discrepancy I noticed from the video and your preview is that the "explosion"/expansion effect on mouse hover is not trying to be close to the surface of the object; in other words it is stuck at a certain layer, does not dynamically pick the cubes closes to the camera view on mouse hover.
good work though
Zulfiqaar@reddit
Fixed! Using camera ray-intersect instead of trunk plane.
My ccusage is saying $113 and 54Mtok for this and counting lol
MrYorksLeftEye@reddit
Thats like $6 on Codex
Clear-Ad-9312@reddit
haha good job, I noticed it
I have switched completely to using kimi's subscription. I have moments that I have to manually steer it, but works great most of the time.
$113 for this small project, damn bro. Yet I wonder how some engineers are able to reach $1.3 million in a single month, claude code is just expensive. crazy lol
I have been thinking of signing up for cursor's subscription since it looks like they fine-tuned kimi's model to be exceptional and fast af.
__Maximum__@reddit
Can you make it alive so I can plant it on my desktop and see them grow? But if i forget to water, they should die, so it feels more real
AnOnlineHandle@reddit
This reminds me of the intro to Lionhead's Black & White 2 where you can move the thingies around and try to mess them up before the game seemingly decides to randomly end the short intro and tips them all over or something, which is oddly very fun.
__JockY__@reddit
Holy shit. Well played.
yuletide@reddit
What is with the excessive italic text on all these AI websites?
Icy-Pay7479@reddit
I swear I’ve seen this layout 3 times this week.
skinnyjoints@reddit
This is Claude’s default style
0xP3N15@reddit
It was cool when it started, but now everyone and their cat is doing it. It's Instrument Serif is the font.
epSos-DE@reddit
ITs about 2GB to download !!!
BUT good to try !
FaatmanSlim@reddit
Its not that great. Here are 2 images I generated with it (4080 GPU though I don't think that matters except for performance), I increased the step size from the default 4 to 10 (second image) and even 20 (first image) and these are what I got. These are the prompts I used for the two: "A Jedi Knight faces off against a Sith Lord in a remote planet" and "A young man piloting a giant robot in outer space"
Reminds me of the 1st generation of image generator models 2-3 years ago, not SOTA.
But very cool concept though! Being able to run models on your local GPU through WebGPU and a browser instead of having to go through an install process, or use cloud models.
ArchdukeofHyperbole@reddit
The images look nice imo for what the model is
aegismuzuz@reddit
With extreme compression down to 1 or 1.58-bit, a diffusion model loses the continuity of the latent space that’s needed for accurate prompt adherence and fine detail generation. Increasing the step count to 10 or 20 doesn’t really help, because the issue isn’t noise convergence - it’s the absence of parameters that encode high-frequency features in the first place. There’s no magic here: once you compress weights by 5x, you inevitably lose detail in complex compositions.
Natural-Rich6@reddit
It can run on CPU and 16 ram?
Everlier@reddit
Unfortunately I can't see any launch path for CPU or AMD GPUs in the sample repo right now, it throws without Nvidia GPU, but it's early
Natural-Rich6@reddit
What about new CPU of Intel and Amd that have npu?
Everlier@reddit
Official repo only has Nvidia GPU code, but granted there's a WebGPU demo and that it's based on Flux, I'm sure there's a path
Cool-Chemical-5629@reddit
I believe there's a saying: "Where there's a will, there's a thorny path leading to a dead end." Or something along those lines.
Everlier@reddit
Yeah, I've launched it on Strix Halo since
m31317015@reddit
It runs on Strix Halo then there's hope.
olliec42069@reddit
What about macs mlx?
Substantial_Swan_144@reddit
As you can see in the captions, the model is 3GB in size. So it should roughly take that amount of RAM, give or take.
camelos1@reddit
I warn you that the model file (3 gb) is stored in the chrome folder, if you have chrome, do not forget to delete it if you used the demo
MarieDeVox@reddit
Looks pretty good based on the ‘ad’ but you never know until you actually take use it. Im still not loving the size especially considering the download gigs but it is better than some of the others in that regard
Ok-Internal9317@reddit
I like the tree better
keyboardhack@reddit
Firefix defaults to cpu for me. Very slow. It works in chrome but it quickly runs out of memory. There is probably a memory leak in their demo.
aegismuzuz@reddit
Firefox still only has experimental WebGPU support and often silently falls back to software rendering, which explains the slowdown. And Chrome crashes are basically just V8 architecture reality at this point. Without optimized layer streaming, browser-based demos like this are always going to be unstable.
StudentZuo@reddit
The browser/WebGPU part is the most interesting bit to me. If inference stays local, the demo becomes a much better evaluation loop: people can test latency, memory pressure, prompt adherence, and failure cases without setting up a Python stack or trusting a hosted endpoint.
For image models, I’d love to see a small “where it breaks” gallery: text in images, fine structure, multiple objects, hands/faces, and style consistency across seeds. That would make the 1-bit vs ternary tradeoff much easier to understand.
oxygen_addiction@reddit
This team is really shady. What they're calling "Bonsai-Image" is just a quantization of FLUX.2 Klein 4B with some post-training to recover performance.
They strategically omit any mention of the FLUX team or the original model.
Not on the Prism-ML HF Web demo page, not on the HF model pages, not on GitHub.
If it were just one place, I could understand, but this is a pattern. They did the same thing with Qwen before: called everything "Bonsai" and tried to distance themselves form the original model and team.
Zero attribution to the people who actually built this. It's disingenuous and completely against the open-source spirit. The only place the original model is mentioned is in the whitepaper, which they know most people will not read.
Don't support this team and their shitty practices.
hellynn@reddit
They literally disclose it’s built from FLUX.2 Klein 4B everywhere lol. Blogposts, whitepaper, model cards, tweets, Hugging Face notices. Acting like they’re “strategically hiding it” is such a reach.
Also the WebGPU demo was made by Hugging Face, not PrismML 😭
This community also loves inventing fake naming “rules” every other week like it’s the constitution or something. It’s open source. People quantize, finetune, remix, and rename models constantly. Focus on the science and the outputs instead of farming outrage over branding.
Getting a 3GB model down to \~1GB while keeping quality is genuinely hard. Anyone who’s tried recovering FLUX after aggressive quantization knows this is not trivial work.
Party-Special-5177@reddit
That’s an improvement then, as their original qwen3 binary/ternary quants were only disclosed in their white paper, on page 6, which was past the preview. Unless you downloaded it with the intent of reading it, you’d never know.
Glad they are trying to improve.
mz_gt@reddit
I don’t think their model card on huggingface uses the “model tree” feature to show that their models are derived from FLUX.2, to be fair. That would be a nice touch. But you’re right they’re not actively hiding it.
AnOnlineHandle@reddit
While you're right it should be attributed, I wonder if there's a point the community would consider a model reworked enough as in the case of Pony6 and Anima to not mention the base model, since it's been retrained or had the architecture changed to the point that it's no longer compatible with it for loras etc.
eposnix@reddit
Says Flix right there on the demo.
pigeon57434@reddit
i agree the naming of the model is weird it shoul dprobably just be called FLUX.2-Klien-4B-1BIT or 1.58Bit respectively but i mean they mention the fact its based on FLUX in like every fucking possible instance what are you talking about?
BurntUnluckily@reddit
Literally the first sentence.
Aaaaaaaaaeeeee@reddit
Yes, renaming them to Bonsai-FLUX.2-Klein-ternary, Bonsai-Qwen3-8B-binary for transparency would be better since everyone would readily see it as ternary conversion.
hellynn@reddit
They literally disclose it’s built from FLUX.2 Klein 4B everywhere lol. Blogposts, whitepaper, model cards, tweets, Hugging Face notices. Acting like they’re “strategically hiding it” is such a reach.
Also the WebGPU demo was made by Hugging Face, not PrismML 😭
This community also loves inventing fake naming “rules”every other week like it’s the constitution or something. It’s open source. People quantize, finetune, remix, and rename models constantly. Focus on the science and the outputs instead of farming outrage over branding.
Getting a 3GB model down to \~1GB while keeping quality is genuinely hard. Anyone who’s tried recovering FLUX after aggressive quantization knows this is not trivial work.
Gnatogryz@reddit
You're right, but on the other hand – isn't zero attribution the unofficial motto of the whole AI revolution?
AsparagusGlobal5036@reddit
Coaches don't play.
FastDecode1@reddit
no
Opposite_Parsley677@reddit
I mean the base architecture is mentioned in the HF model card and in the NOTICE. Also mentioned in the blog and whitepaper. They share some details on their approach and comparisons with the base model in the whitepaper as well.
mzzmuaa@reddit
i found it incidentally mentioned here https://huggingface.co/prism-ml/bonsai-image-ternary-4B-gemlite-2bit"1.21 GB diffusion transformer, down from 7.75 GB for the FP16 FLUX.2 Klein 4B transformer"
aegismuzuz@reddit
Curious how they handled the noise schedule at that level of aggressive quantization. The original FLUX works really well with low step counts, but once you compress it down to 1.58-bit precision, the model starts losing gradient accuracy in latent space
Icy-Reaction-9101@reddit
Thumbnail generator? Or does it support 4k images?
Napster3301@reddit
the "image quality is bad" takes are missing what's demonstrated here. ternary diffusion is harder than ternary text generation because noise prediction is continuous and high-frequency image detail dies first under aggressive quanization. getting recognizable outputs at all from 1.5gb of ternary weights is the technical story, not whether it beats sota.
the question isnt "can this replace flux" (no, not yet). its whether ternary scales. if you can recover most of fp16 quality at 1.5gb here, what happens when someone tries it on a 12b base? thats where this gets interesting.
techlatest_net@reddit
3GB for a text-to-image model that runs in-browser? That's actually insane.
TanJeeSchuan@reddit
Decent generations for model that can fit in a 6gm VRAM. Too bad it sucks at UI Icon generation, not my use case
exaknight21@reddit
This sub is getting salty by the second. Kudos to PrismML for trying. Bitnet is the future. And I’m here for it.
Ice_Falco@reddit
is their a good higher parameter model?
shockwaverc13@reddit
is the demo broken? it's OOMing the system when i have more than 8gb of ram
KURD_1_STAN@reddit
U cant call sometime state of the art when it is meant just to be small and stil useable. And what is with comparing its size at q6 without text encoder to flux klein 4b at fp16+ text encoder at fp16 as well combined?
a_beautiful_rhind@reddit
What good are 1 bit image models? T2I have to be trained and have lora made. You can't get by on prompting for visuals like you can with text.
Twirrim@reddit
Theoretically the output is the same quality has higher bit models. There's a whitepaper that goes into the details, somewhere, but it's something like taking the higher quality model, quantizing it, then tweaking the quanitzed model until it provides close to comparable performance. I experimented with the 8B model before, which was built from Qwen3-8B, and it seemed fine, used a lot less resources, and ran a lot faster. I didn't do any particularly in depth testing, though.
a_beautiful_rhind@reddit
Maybe the lora from the higher Q will transfer over? But if you train it directly I assume it will lose the ternary portion.
I don't know many people that use the image models as-is compared to text.
Randomdotmath@reddit
I did some testing and the prompt understanding is actually pretty good—quantities, contrast, and positioning all came out accurate. The generation quality is still rough though (lots of finger clipping and spelling mistakes), but damn… running under 0.5s per step on an A10 is actually insane.
Majestic-Volume9996@reddit
I like how their image didn't match their prompt in anyway whatsoever.
StartupTim@reddit
What is the web front-end used to make the images, and does it support an API interface?
Thunderstarer@reddit
what the fuck
IrisColt@reddit
Thanks!!!
PhoenixxBR@reddit
se eu quiser usar Flux 2, é só eu baixar o flux e usar no comfyui, porque vou baixar um programa suspeito para isso?
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Another__one@reddit
PrismML doing some god's work lately. Can't wait to see more massive 27B and more ternary models. I know it is expensive to train, but considering that there already is blockchain based distributed training systems, I would be more then happy to donate all the compute I have to train a model like this. And I guess I am not the only one.
Immediate_Credit_624@reddit
Very cool animation, almost more interesting than the model !
loftybillows@reddit
So sick!!
ANR2ME@reddit
Hmm.. i don't quite understand on the s/image result 🤔 is it faster or slower than the baseline FP16?
ActuatorOk7459@reddit
Wow, that looks cool.