"Browser OS" implemented by Qwen 3.6 35B: The best result I ever got from a local model
Posted by tarruda@reddit | LocalLLaMA | View on Reddit | 38 comments
mister2d@reddit
I love Bijan's channel.
I know you used Q8 but I used the UD-Q4_K_XL and got a fully functional desktop with no errors and local storage. Also passed the "right-click" test.
This is an impressive model. I typically run the "browser OS" test from time to time and it's never this good.
Jungle_Llama@reddit
Oddly I tried it with UD-Q4_K_XL and none of the apps would open, told it to revise and same thing. b8849 vulkan with this added to my usual prompt.
mister2d@reddit
I'm using CUDA 13.1 This is my preset that worked great on the first go.
Jungle_Llama@reddit
Thanks for sharing this. Think Vulkan is a bit buggy still. Just saw the reasoning bleed into an answer on the GUI which I have never seen before. Modified my parms to reflect yours and saw a 40% drop in t/s however It did indeed perform the task with the exception of the "special" one. Interesting.
mister2d@reddit
I should have given more context. My setup demands a specific thread count and layer management to get to 128k and 256k ctx.
I'm running some old gear but I have lots of DDR3 ram (256 GB) with two 3060s, and I have CPU affinity to account for as well.
The t/s jumps up and down a bit but the floor is around 37-40 t/s.
I've been doing more web dev tests and I'm perplexed as to why the results continue to be so good.
Jungle_Llama@reddit
Those speeds with DDR3 ram involved is something I didn't think I'd ever see. I have a bag of them sitting here doing nothing. I adjusted mine to refect my HW as well. x99, xeon v4, DDR4 at 2400. I must do some more tests. Cheers.
Jungle_Llama@reddit
Updated to b8855. Now everything works as expected. 1 shot on Q4 XL, saw 30% t/s increase on an addition to the code from 75 t/s to 95 t/s in parts of the edit. Fantastic.
tarruda@reddit (OP)
Yes. TBH I've tried before with 3.6 and also didn't get such a good result, so there was some luck involved. Plus some new CLI args such as temp 1.0 and speculative decoding which I wanted to test.
tarruda@reddit (OP)
In case someone wants to try replicate this locally: I'm using llama.cpp version 8849 (d5b780a67). The complete script I use to run it:
Own_Suspect5343@reddit
Does ngram work with qwen?
tarruda@reddit (OP)
Yes. I normally get 50 tokens/second generation with this modem. After I asked It to ads a fature to the web os, most of the generation was around 110 tokens/second since most of the code was already in the prompt.
Ranmark@reddit
when i use similar script on the qwen3.6 35b, i get those warnings:
srv load_model: speculative decoding is not supported by multimodal, it will be disabled
srv load_model: swa_full is not supported by this model, it will be disabled
tarruda@reddit (OP)
True, it doesn't support swa-full, this is a template script I use for launching llama-server LLMs (I used to do this to disable SWA on gpt-oss).
But speculative decoding is working, though it was merged a couple of days ago: github.com/ggml-org/llama.cpp/pull/19493
Ranmark@reddit
Bruh, they cooking new releases so fast, I couldn't keep up. Thanks for pointing this out. Just updated and can confirm now it is working. Already ran a couple of tasks and i see random boosts to t/s like from 22 to 29. Damn
tarruda@reddit (OP)
Yea for repeating things already in context it speeds up a lot. So in the web UI if you are iterating on some piece of code (where the model outputs mostly the same code but with fixes) you will see huge speed bumps.
Small-Challenge2062@reddit
vision image to text is working there?
tarruda@reddit (OP)
Yes
Additional-Curve4212@reddit
hey unrelated question, do you work in corporate or earn some other way? About to graduate soon wondered what y'all do for a living
ikmalsaid@reddit
Cool stuff! How's the speed and did you use a code agent or just the llama.cpp web ui?
tarruda@reddit (OP)
Just lama.cpp web UI. This was one shot, runnable though the preview button
Total_Ad_133@reddit
small bug - once you choose a custom color, you cant change back to choosing any of the predefined backgrounds.
kahdeg@reddit
https://jsfiddle.net/8a1fxup2/
Complete_Instance_18@reddit
This is super cool to see for a local model!
mobileJay77@reddit
You don't know what an OS does. Stop calling it that.
Dany0@reddit
My opinion on Bijan, well, I could say it without mincing words, but I cba to check if it's technically allowed by reddit TOS
I interacted with him on this awful place called Twatter. He's exactly as uncurious, self-righteous and gluttonous as you imagine him to be
You can't teach him what an OS is. You can't teach him the Pythagorean theorem. Diagonalize a matrix? You think he would ever sit down and learn linear algebra? He'll ask chatGPT. Watch him do it right now
But I am sure, the YT algorithm will change soon, viewers will switch to better content. And he'll be the same person he ever was, and that is punishment befitting the crime. Sloth
mobileJay77@reddit
Who's Bijan?
leonbollerup@reddit
It was called that properly before you were born.. and it had different names in the past also.. WebOS, WebDesktop etc
Any-Television693@reddit
No. It was called as website
leonbollerup@reddit
It never was.. for those of us who was in the middle of it.. it was so much more.. scroll back to the history of 2004->2006.. look up names such as mine, words such as ”eyes”, ”windows live”, ”fenestela, ”orcaa” and my favorite .. StartForce.
None of these was ”merely” a petty website.. it was ingenious coding that pushed the limit of what we could do back then
finevelyn@reddit
What kind of a website?
leonbollerup@reddit
Somebody dosent know he’s history
jacobpederson@reddit
Try this prompt please. Frustrated yet? Now boot up Gemma-4-26b-a4b and watch for a 1 minute one-shot :D
Grouchy_Ad_4750@reddit
If you want to improve your results you could ask the model to split it into multiple files.
For example:
```
Create react app that ...
1) Split into multiple components
2) ...
```
While it is impressive that it can one shot this it isn't really maintainable by model or human. Other than that fun project! :) You could also "host" it on https://jsfiddle.net/ for easy preview
tarruda@reddit (OP)
Not sure if you noticed, but when the LLM returns html code snippets on llama-server web UI, there's an "eye" icon you can click to test a preview. That's why I normally ask for single html file in these tests.
Grouchy_Ad_4750@reddit
Oh, I don't use llama-cpp so I wouldn't know but that's neat :)
For playing with LLMs its surely enough but you could also use some coding agent https://pi.dev/ or something
Depends on your goals of course
Express_Quail_1493@reddit
Arent These single file LLM coding tests like browserOS pretty much redundant now most 2026 LLM can easily handle this?
tarruda@reddit (OP)
I shared the prompt in the gist, you can try it. While most LLMs can get parts or most of it working, I never had it hit 100% like now.
Note that the prompt has constraints, so it is just not a simple "make a WebOS" prompt, where it could pull results verbatim from its training data.
tarruda@reddit (OP)
If someone wants to try, just save the html from the gist locally and open with a web browser.
I've included the full prompt and response in the gist, but here it is for completeness:
Using html, css and js, generate a browser OS with the following features: - At least 5 applications - Three of the 5 applications must be FUNCTIONAL games (tetris, snake and flappy bird) - Ability to change wallpaper - A "special" feature that you decide on and document what it is & why it is special.
This is adapted from Bijan Bowen browser OS prompt, but I found this to be harder because I specifically request these 3 games.
I don't think I ever got such a perfect response from a local model. AFAICT everything is working 100% correctly.