"Browser OS" implemented by Qwen 3.6 35B: The best result I ever got from a local model

[-]

mister2d@reddit

I love Bijan's channel.

I know you used Q8 but I used the UD-Q4_K_XL and got a fully functional desktop with no errors and local storage. Also passed the "right-click" test.

This is an impressive model. I typically run the "browser OS" test from time to time and it's never this good.

[-]

Thanks for sharing this. Think Vulkan is a bit buggy still. Just saw the reasoning bleed into an answer on the GUI which I have never seen before. Modified my parms to reflect yours and saw a 40% drop in t/s however It did indeed perform the task with the exception of the "special" one. Interesting.

[-]

mister2d@reddit

I should have given more context. My setup demands a specific thread count and layer management to get to 128k and 256k ctx.

I'm running some old gear but I have lots of DDR3 ram (256 GB) with two 3060s, and I have CPU affinity to account for as well.

The t/s jumps up and down a bit but the floor is around 37-40 t/s.

I've been doing more web dev tests and I'm perplexed as to why the results continue to be so good.

[-]

Jungle_Llama@reddit

Those speeds with DDR3 ram involved is something I didn't think I'd ever see. I have a bag of them sitting here doing nothing. I adjusted mine to refect my HW as well. x99, xeon v4, DDR4 at 2400. I must do some more tests. Cheers.

[-]

Jungle_Llama@reddit

Updated to b8855. Now everything works as expected. 1 shot on Q4 XL, saw 30% t/s increase on an addition to the code from 75 t/s to 95 t/s in parts of the edit. Fantastic.

[-]

tarruda@reddit (OP)

it's never this good.

Yes. TBH I've tried before with 3.6 and also didn't get such a good result, so there was some luck involved. Plus some new CLI args such as temp 1.0 and speculative decoding which I wanted to test.

[-]

tarruda@reddit (OP)

In case someone wants to try replicate this locally: I'm using llama.cpp version 8849 (d5b780a67). The complete script I use to run it:

#!/bin/sh -e

model=$HOME/ml-models/huggingface/unsloth/Qwen3.6-35B-A3B-GGUF/Qwen3.6-35B-A3B-Q8_0.gguf
mmproj=$HOME/ml-models/huggingface/unsloth/Qwen3.6-35B-A3B-GGUF/mmproj-BF16.gguf

ctx=262144
parallel=3

ctx_size=$((ctx * parallel))

llama-server --no-mmap --no-warmup --mmproj $mmproj --model $model --ctx-size $ctx_size --swa-full -np $parallel --jinja --temp 1.0 --repeat-penalty 1.0  --presence-penalty 0.0 --top-p 0.95 --top-k 20 --min-p 0.00 --host 0.0.0.0 --chat-template-kwargs '{"preserve_thinking": true}' --cache-ram 16384 -ctxcp 128 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64

[-]

Own_Suspect5343@reddit

Does ngram work with qwen?

[-]

tarruda@reddit (OP)

Yes. I normally get 50 tokens/second generation with this modem. After I asked It to ads a fature to the web os, most of the generation was around 110 tokens/second since most of the code was already in the prompt.

[-]

Ranmark@reddit

when i use similar script on the qwen3.6 35b, i get those warnings:
srv load_model: speculative decoding is not supported by multimodal, it will be disabled

srv load_model: swa_full is not supported by this model, it will be disabled

[-]

tarruda@reddit (OP)

True, it doesn't support swa-full, this is a template script I use for launching llama-server LLMs (I used to do this to disable SWA on gpt-oss).

But speculative decoding is working, though it was merged a couple of days ago: github.com/ggml-org/llama.cpp/pull/19493

[-]

Ranmark@reddit

Bruh, they cooking new releases so fast, I couldn't keep up. Thanks for pointing this out. Just updated and can confirm now it is working. Already ran a couple of tasks and i see random boosts to t/s like from 22 to 29. Damn

[-]

tarruda@reddit (OP)

Yea for repeating things already in context it speeds up a lot. So in the web UI if you are iterating on some piece of code (where the model outputs mostly the same code but with fixes) you will see huge speed bumps.

[-]

Small-Challenge2062@reddit

vision image to text is working there?

[-]

tarruda@reddit (OP)

Yes

[-]

Additional-Curve4212@reddit

hey unrelated question, do you work in corporate or earn some other way? About to graduate soon wondered what y'all do for a living

[-]

ikmalsaid@reddit

Cool stuff! How's the speed and did you use a code agent or just the llama.cpp web ui?

[-]

tarruda@reddit (OP)

Just lama.cpp web UI. This was one shot, runnable though the preview button

[-]

Total_Ad_133@reddit

small bug - once you choose a custom color, you cant change back to choosing any of the predefined backgrounds.

[-]

kahdeg@reddit

https://jsfiddle.net/8a1fxup2/

[-]

Complete_Instance_18@reddit

This is super cool to see for a local model!

[-]

mobileJay77@reddit

You don't know what an OS does. Stop calling it that.

[-]

Dany0@reddit

My opinion on Bijan, well, I could say it without mincing words, but I cba to check if it's technically allowed by reddit TOS

I interacted with him on this awful place called Twatter. He's exactly as uncurious, self-righteous and gluttonous as you imagine him to be

You can't teach him what an OS is. You can't teach him the Pythagorean theorem. Diagonalize a matrix? You think he would ever sit down and learn linear algebra? He'll ask chatGPT. Watch him do it right now

But I am sure, the YT algorithm will change soon, viewers will switch to better content. And he'll be the same person he ever was, and that is punishment befitting the crime. Sloth

[-]

mobileJay77@reddit

Who's Bijan?

[-]

leonbollerup@reddit

It was called that properly before you were born.. and it had different names in the past also.. WebOS, WebDesktop etc

[-]

Any-Television693@reddit

No. It was called as website

[-]

leonbollerup@reddit

It never was.. for those of us who was in the middle of it.. it was so much more.. scroll back to the history of 2004->2006.. look up names such as mine, words such as ”eyes”, ”windows live”, ”fenestela, ”orcaa” and my favorite .. StartForce.

None of these was ”merely” a petty website.. it was ingenious coding that pushed the limit of what we could do back then

[-]

finevelyn@reddit

What kind of a website?

[-]

leonbollerup@reddit

Somebody dosent know he’s history

[-]

jacobpederson@reddit

Try this prompt please. Frustrated yet? Now boot up Gemma-4-26b-a4b and watch for a 1 minute one-shot :D

Create a single-file HTML page using only HTML, CSS, and vanilla JavaScript (no libraries).
Build a centered 3D scene containing a fully functional Rubik’s Cube made of 27 smaller cubies. Each cubie must have correctly colored faces (classic cube colors).
The cube should:

Start idle with a slight 3D perspective view
Include a "Start" button below the scene
When clicked, automatically scramble the cube with random realistic face rotations
Then solve itself step by step using reverse moves or a logical sequence
Each move must animate smoothly with easing (no instant jumps)
Rotations should affect only correct layers (like real cube physics) Animation requirements:
Total loop duration: ~30 seconds
Include phases: scramble → solve → short pause → repeat infinitely
Use smooth cubic-bezier or ease-in-out transitions Visual style:
Dark background (black or gradient)
Glowing cube faces with subtle reflections
Soft shadows and depth for realism
Clean modern UI button with hover animation Extra features:
Allow mouse drag to rotate the entire cube in real time
Maintain transform consistency (no breaking cube structure)
Ensure animation is smooth and optimized Output:
Return complete working code in one HTML file only
No explanation, only code

[-]

Grouchy_Ad_4750@reddit

If you want to improve your results you could ask the model to split it into multiple files.

For example:

```
Create react app that ...
1) Split into multiple components
2) ...
```
While it is impressive that it can one shot this it isn't really maintainable by model or human. Other than that fun project! :) You could also "host" it on https://jsfiddle.net/ for easy preview

[-]

tarruda@reddit (OP)

Not sure if you noticed, but when the LLM returns html code snippets on llama-server web UI, there's an "eye" icon you can click to test a preview. That's why I normally ask for single html file in these tests.

[-]

Grouchy_Ad_4750@reddit

Oh, I don't use llama-cpp so I wouldn't know but that's neat :)

For playing with LLMs its surely enough but you could also use some coding agent https://pi.dev/ or something

Depends on your goals of course

[-]

Express_Quail_1493@reddit

Arent These single file LLM coding tests like browserOS pretty much redundant now most 2026 LLM can easily handle this?

[-]

tarruda@reddit (OP)

I shared the prompt in the gist, you can try it. While most LLMs can get parts or most of it working, I never had it hit 100% like now.

Note that the prompt has constraints, so it is just not a simple "make a WebOS" prompt, where it could pull results verbatim from its training data.

[-]

tarruda@reddit (OP)

If someone wants to try, just save the html from the gist locally and open with a web browser.

I've included the full prompt and response in the gist, but here it is for completeness:

Using html, css and js, generate a browser OS with the following features: - At least 5 applications - Three of the 5 applications must be FUNCTIONAL games (tetris, snake and flappy bird) - Ability to change wallpaper - A "special" feature that you decide on and document what it is & why it is special.

This is adapted from Bijan Bowen browser OS prompt, but I found this to be harder because I specifically request these 3 games.

I don't think I ever got such a perfect response from a local model. AFAICT everything is working 100% correctly.