the longest post on my site, takes 92 KiB instead of 37 KiB. This amounts to an unnecessary 2.5x increase in load time.
This 92KiB body will probably get all sent together in one clump of packets and reach your destination faster than any back and forth negotiations. The increase of load time is literally 0x
This is the first time I've ever seen the word "Brotli". (I'm not a Web developer. I'm not really a developer at all. I'm a sysadmin who sometimes writes small programs.) Is there a summary available on why maintainers don't want to implement it?
Browsers currently only have the decompression algorithm included but the web compression API also offers compression. They don't want to just offer the decompression API because it would be confusing but they also don't want to add the relatively large compressor.
Enabling brotli for compression is difficult for Blink because we don't currently ship the compression side of the library and it has a 190KB binary size cost just for the built-in dictionary. Adding anything over 16KB to Chromium requires a good justification.
This sentence upset me. There are likely petabytes of waste going across the wire today b/c someone was worried about < 200kb install size while also insisting that compression must be symmetrical lest it confuse ppl.
Admittedly I'm reading this issue blind so I might be missing other context, but this feels very pennywise pound-foolish.
Shame web developers don't care about more people and try to keep their software lean and optimized rather than hoping everyone else will fix their JS bloat.
You can't necessarily have both and certainly not in equal measures but would be good to keep them in mind.
By pushing it onto compression, as this article states and many others the compression time is longer so it's doing more work, there's potentially a higher cost. Why would we burn more energy because someone doesn't want to just make their site / app lighter?
Maybe it does work out as a benefit overall but given web performance rarely talks about how green something is, I bet it's not commonly understood.
I think you are underestimating the amount of work put into reducing the binary size. I bet Chromium would be a lot bigger than it is now if the developers were free to waste space on any major features.
I guess you're stupid. You linked me to actual binary size not the 100+mb distributable, where the dictionary wouldn't be (or technically it can be since anything can be in a binary). We probably have different definitions of major if you think it'd be something that happens often
I'm just saying that folks and Google clearly care about size, using the wiki page as an example. I don't appreciate being called stupid, moreso for disagreeing on the grounds of values instead of objective facts.
That's waaaaaaaaaaay above my paygrade and if you're manually decoding stuff, you might as well use a custom compression format. The implementation is going to be different, unrelated to this project, and have different area of application. A neat idea though.
Alright, so we’re dealing with 92 KiB for gzip vs 37 + 71 KiB for Brotli. Umm…
I see other people talking about canvases, so I suspect you're using the technique you talk about in this very post, but my browser doesn't seem to like it. Gives a console error
Uncaught TypeError: c is null
https://purplesyringa.moe/blog/webp-the-webpage-compression-format/:2
webp-the-webpage-compression-format:2:3424
It doesn't work in all web browsers (e.g. LibreWolf, Sailfish Browser) - I just see an empty space after Umm….
As long as it is not universally accessible with a fallback to plain HTML, it shouldn't be widely used.
This is a clever idea. I've been wanting to use compression on short strings passed as URL parameters (imagine sharing documents or recipes entirely in the URL hash). Now that the Compression Streams API is widely implemented I'll have to give it another crack.
But if you are doing this you should really include the full content in the feed. Because now my feed reader just gets a snippet and <div style=height:100000px> after trying to scrape the page. It looks like you have only implemented it for this post, so that is nice. But it would be annoying if this became the new standard.
One major concern is performance. Especially on low-end devices doing this in JavaScript will easily negate any savings. It seems that in general network bandwidth is growing faster than CPU speed. And especially since I believe setting document.documentElement.innerHTML will use a main-thread blocking parser rather than the regular streaming parser that will be used for the main document during download. So you are replacing a background download of content that the user probably hasn't read up to yet with a UI blocking main-thread decompression.
A very cool demo, but I think the conclusion is that the real solution is to replace GitHub pages with a better server. For example better cache headers, proper asset versioning and newer compression standard.
I'm using a different approach to pass data via URL parameters. gzip and co. have large headers and dictionaries, you probably want something smaller. lz-string in particular turned out to be a better choice in my experiments.
Also, domain-specific compression helps greatly. Using arithmetic coding with a hard-coded fine-tuned entropy distribution helped me compress source code significantly.
Yeah, I was wondering about using deflate-raw and see how much data it took before it had a notable positive improvement. For short strings you probably won't gain much. If br was supported you could cheat for web content because it ships a web-focused dictionary. But this won't help you too much for general compression.
But for things like documents and recipes I suspect that you can get a notable improvement pretty quickly. (Although things this size are probably not the best for URL parameters in general, but it is nice if you want to put a quick site up without worrying about user data.)
Thank you for your kind words :) If you don't mind, could you please describe what screamed "imposter syndrome" to you? I know I have it and I try to battle it, but apparently my efforts weren't good enough (lol).
Ah I didn't realise OP is the same person as the writer, it is just an observation after reading your bio, it was just in passing so don't take too seriously.
I'm familiar with
Frontend: basics (HTML/CSS/JS), TypeScript, Vue, React (and Next.js), Webpack et al.
Backend: Flask (i.e. Python), Rocket (i.e. Rust), Express (i.e. JS), good old PHP
Sysadmin: mostly Linux, basic systemd, nginx, httpd stuff
Systems programming: C/C++, Rust (including embdedded), Python, a bit of Go
Low-level: Linux kernel & modules, x86 assembly and optimization, a bit of compiler internals
High-performance computing: nothing to note in particular, just an unhealthy dose of attachment to performance and experience optimising code for x86
Algorithms & data structures: programming competitions and still continuing bachelor's program
Networking: basics and experience with ZeroNet
Information security: mostly CTFs and high security projects
Open-source: contributed to a few projects and released lots of my own This isn't much, but it's honest work I'm open to learning more.
It was just the bolded line above that gave me the thought, you have a lot more skills than people my company has hired on massive salaries, you would be surprised at the level of skillset at many companies.
And the above is just a summary, the verbose list is even more impressive, so you were contributing since you were 12 if my math is right lol.
Anyway, thanks for sharing your work, I hope you reach greater heights, it is great I can use these examples to inspire my nieces in the future, women like Justine Tunney who created redbean, Freya Holmér are inspirational and showcase the talent that's great to see.
I like the reddit thread integration, comments and feedback from wider world, obviously you can't control the feedback but it is still great idea.
I've implemented something similar on our website, albeit not this fancy and technical, and we had to make major adjustments to the MVP because the <canvas> API is inconsistent, slow, and resource intensive. It's also not reliably available as discussed in the blog article because it's unsafe.
My solution was to pre-compress the data as PNGs and use the <img> tag to deconstruct the base64-encoded images.
A really interesting application of this could definitely be compressing uploads in cases where you know they're going to be sending you highly compressible data.
Extremely cursed and extremely well done. I was reading this on my phone and had a suspicion that the page I was reading used the technique mentioned, but didn't have any idea it came into effect past a specific point, so the transition was seamless. I'd call that a win for an experiment, good job!
Generally canvas fingerprinting is done by drawing some system-dependent stuff onto a canvas (hardware acceleration, 3d shapes, fonts, emojis &c) and hashing the pixels of the canvas. If the telemetry server sees 2 pageviews that computed the same canvas hash, it's a signal that the pageviews might be from the same browser. Adding noise means the hash will always be unique so it can't be used to correlate pageviews across visits in this way.
Possibly, but it depends on the type of noise. Currently, it looks like it's a few low bits set on random pixels are changes, but there's nothing requiring that type of noise.
Hashing algorithm ignores the low bits on each pixel? The noise could return an adjacent pixel instead of altering the value of the current pixel.
Hashing algorithm averages over some region? The same noise to the low bits could be applied to all pixels in a small region. (This hashing would likely also defeat the point of the fingerprinting, since it would average out small differences in rendering engines that the hashing is trying to detect.)
It's a cat and mouse game, where unethical websites try to find more ways to spy on users, and browsers try to find more ways to stop them from doing so. If websites start adjusting the hash they use to fingerprint users, then browsers can and should update their protections to match the new thread.
For fonts and emojis, it seems like someone could work around this and still fingerprint users by drawing to an oversized canvas (say, 3x scale), pulling the image data into a plain array (so it gets fuzzed this one time), downscaling the data by hand to shrink the fuzz out of existence, and then hashing that.
gzip is so cheap everyone enables it by default, but Brotli is way slower.
Is this correct? I was under the impression that these new-fangled compression algorithms were designed to prioritize speed just as much as size. I'm no expert, but most of the results of a quick search seem to contradict this.
Definitely a fun read. I’ve never thought about using an image to compress arbitrary data.
Perhaps a downside to working in the industry is that I kinda lost this creative thinking. A more practical solution would be to defer load content so that the 30KB vs 80KB difference becomes insignificant but that’s no fun at all.
I do love the cursed and creative workarounds devs come up with. The bit about data randomization from canvas was a surprise. Super weird that some APIs are affected and not others.
GitHub doesn’t do that for us, but we can still take advantage of precompressed data. We’ll just have to manually decompress it in JavaScript on the client side.
Google__En_Passant@reddit
This 92KiB body will probably get all sent together in one clump of packets and reach your destination faster than any back and forth negotiations. The increase of load time is literally 0x
imachug@reddit (OP)
There are no back and forth negotiations.
dweezil22@reddit
~~Flying to Mexico for medical procedures b/c US Healthcare is crazy~~
Using WebP to compress a webpage b/c the compression maintainers refuse to standardize Brotli for dumb reasons
imachug@reddit (OP)
I wouldn't call the reasons dumb. Perhaps some people are overly pessimistic, but the concerns are well-formed, if misguided.
Jonathan_the_Nerd@reddit
This is the first time I've ever seen the word "Brotli". (I'm not a Web developer. I'm not really a developer at all. I'm a sysadmin who sometimes writes small programs.) Is there a summary available on why maintainers don't want to implement it?
MINIMAN10001@reddit
Brotli is a compression algorithm that excelled in slow but high compression, with fast decompression.
Which is pretty ideal for usage on the web.
3inthecorner@reddit
Browsers currently only have the decompression algorithm included but the web compression API also offers compression. They don't want to just offer the decompression API because it would be confusing but they also don't want to add the relatively large compressor.
dweezil22@reddit
This sentence upset me. There are likely petabytes of waste going across the wire today b/c someone was worried about < 200kb install size while also insisting that compression must be symmetrical lest it confuse ppl.
Admittedly I'm reading this issue blind so I might be missing other context, but this feels very pennywise pound-foolish.
tyjuji@reddit
It's a ridiculous sentence. Even 200 megabytes is fuck all on a modern system.
Swimming-Cupcake7041@reddit
There are many non-modern systems that run Blink/Chrome.
Chii@reddit
and that's how you end up with hundreds of electron apps!
franklindstallone@reddit
Shame web developers don't care about more people and try to keep their software lean and optimized rather than hoping everyone else will fix their JS bloat.
Plank_With_A_Nail_In@reddit
software being lean is not the same as it being optimized...not close to the same.
franklindstallone@reddit
Which is why I used and rather than or.
You can't necessarily have both and certainly not in equal measures but would be good to keep them in mind.
By pushing it onto compression, as this article states and many others the compression time is longer so it's doing more work, there's potentially a higher cost. Why would we burn more energy because someone doesn't want to just make their site / app lighter?
Maybe it does work out as a benefit overall but given web performance rarely talks about how green something is, I bet it's not commonly understood.
inu-no-policemen@reddit
That reasoning is from the days when Chrome was like 10MB. (Same with Firefox.)
It's now over 100MB.
PhysicalMammoth5466@reddit
Well formed? 190kb too large in 106MB app? For a major feature? If that's that you're calling a well-formed concern I'll be calling you stupid
imachug@reddit (OP)
I think you are underestimating the amount of work put into reducing the binary size. I bet Chromium would be a lot bigger than it is now if the developers were free to waste space on any major features.
PhysicalMammoth5466@reddit
I guess you're stupid. You linked me to actual binary size not the 100+mb distributable, where the dictionary wouldn't be (or technically it can be since anything can be in a binary). We probably have different definitions of major if you think it'd be something that happens often
imachug@reddit (OP)
I'm just saying that folks and Google clearly care about size, using the wiki page as an example. I don't appreciate being called stupid, moreso for disagreeing on the grounds of values instead of objective facts.
bloomstein@reddit
This prevents the browser from stream-rendering the page as its downloaded. Neat idea otherwise, though!
imachug@reddit (OP)
I only compress the data below viewport, so the browser can still stream-render the first part of the page and give good first impressions.
But yeah, it's not ideal.
bloomstein@reddit
Perhaps you could emulate HTML stream rendering by stream rendering the webp image as it’s downloaded and appending the html bytes to body
imachug@reddit (OP)
That's waaaaaaaaaaay above my paygrade and if you're manually decoding stuff, you might as well use a custom compression format. The implementation is going to be different, unrelated to this project, and have different area of application. A neat idea though.
guest271314@reddit
Nice work.
Balance-@reddit
Fun read, thanks!
Don’t forget to upvote the root issue: https://github.com/whatwg/compression/issues/34
bleachisback@reddit
My browser doesn't load anything after
I see other people talking about canvases, so I suspect you're using the technique you talk about in this very post, but my browser doesn't seem to like it. Gives a console error
galambalazs@reddit
Doesn’t work for me in mobile Safari too
Ytrog@reddit
I love the idea. Very clever. I wonder how JPEG-XL would fare in this case. 👀
Maybe it would be a good idea to add a column to your metrics with the entropy, as that determines how compressible something is. 🤔
niutech@reddit
It doesn't work in all web browsers (e.g. LibreWolf, Sailfish Browser) - I just see an empty space after Umm…. As long as it is not universally accessible with a fallback to plain HTML, it shouldn't be widely used.
TheAznCoderPro@reddit
~~As long as it is not universally accessible with a fallback to plain HTML,~~ it shouldn't be widely used.
kevincox_ca@reddit
This is a clever idea. I've been wanting to use compression on short strings passed as URL parameters (imagine sharing documents or recipes entirely in the URL hash). Now that the Compression Streams API is widely implemented I'll have to give it another crack.
But if you are doing this you should really include the full content in the feed. Because now my feed reader just gets a snippet and
<div style=height:100000px>
after trying to scrape the page. It looks like you have only implemented it for this post, so that is nice. But it would be annoying if this became the new standard.One major concern is performance. Especially on low-end devices doing this in JavaScript will easily negate any savings. It seems that in general network bandwidth is growing faster than CPU speed. And especially since I believe setting
document.documentElement.innerHTML
will use a main-thread blocking parser rather than the regular streaming parser that will be used for the main document during download. So you are replacing a background download of content that the user probably hasn't read up to yet with a UI blocking main-thread decompression.A very cool demo, but I think the conclusion is that the real solution is to replace GitHub pages with a better server. For example better cache headers, proper asset versioning and newer compression standard.
axonxorz@reddit
Web Worker?
kevincox_ca@reddit
That could help with the decompression. But you still need to actually inject the new HTML at some point, which is likely the majority of the cost.
imachug@reddit (OP)
I'm using a different approach to pass data via URL parameters. gzip and co. have large headers and dictionaries, you probably want something smaller. lz-string in particular turned out to be a better choice in my experiments.
Also, domain-specific compression helps greatly. Using arithmetic coding with a hard-coded fine-tuned entropy distribution helped me compress source code significantly.
kevincox_ca@reddit
Yeah, I was wondering about using
deflate-raw
and see how much data it took before it had a notable positive improvement. For short strings you probably won't gain much. Ifbr
was supported you could cheat for web content because it ships a web-focused dictionary. But this won't help you too much for general compression.But for things like documents and recipes I suspect that you can get a notable improvement pretty quickly. (Although things this size are probably not the best for URL parameters in general, but it is nice if you want to put a quick site up without worrying about user data.)
imachug@reddit (OP)
Doesn't your reader support
<noscript>
? I'm not sure how I'm supposed to handle clients that don't respect it but also don't support JS.As for the other concerns, yeah, I agree. This was mostly a fun little idea that stuck in my mind rather than anything terribly practical.
kevincox_ca@reddit
The only thing in the
<noscript>
is a meta refresh which I suspect nearly no readers support. Most readers aren't "full browsers".Probably it would be good to also add a message like "Sorry, this post requires JS to view" in the
<noscript>
as well.imachug@reddit (OP)
True that. I've updated the feed to use a no-JS version. Thanks for a bug report! :)
agentoutlier@reddit
I was reading and thinking damn this person is gifted and knowledgable.
Click on the about... 19 years old! Goddamn that is impressive.
Successful-Peach-764@reddit
She is amazing, read the bio and see the imposter syndrome at work, I guess everyone has doubts about their skills.
Love seeing the new generation sharing their ideas.
imachug@reddit (OP)
Thank you for your kind words :) If you don't mind, could you please describe what screamed "imposter syndrome" to you? I know I have it and I try to battle it, but apparently my efforts weren't good enough (lol).
Successful-Peach-764@reddit
Ah I didn't realise OP is the same person as the writer, it is just an observation after reading your bio, it was just in passing so don't take too seriously.
And the above is just a summary, the verbose list is even more impressive, so you were contributing since you were 12 if my math is right lol.
Anyway, thanks for sharing your work, I hope you reach greater heights, it is great I can use these examples to inspire my nieces in the future, women like Justine Tunney who created redbean, Freya Holmér are inspirational and showcase the talent that's great to see.
I like the reddit thread integration, comments and feedback from wider world, obviously you can't control the feedback but it is still great idea.
Kwinten@reddit
I'm quite sure what they meant was that they get imposter syndrome from reading everything you've already accomplished at your age.
gwern@reddit
https://hero.fandom.com/wiki/Alisa_Selezneva I'm guessing.
jeffcgroves@reddit
Isn't .webp already being used for images/videos?
atomic1fire@reddit
This is for compressing the entire page, not just images and video.
nemothorx@reddit
You should read before commenting
shevy-java@reddit
Now Linus would be happy to invite back Rust devs into the Kernel!
The C folks didn't come up with this solution. It took a Rustee for the win.
YetAnotherRobert@reddit
This is almost "thanks, I hate it" levels of clever.
Nicely researched and executed!
oblong_pickle@reddit
I just see what I presume is binary data, nothing else
bruhprogramming@reddit
.moe domains my beloved
starm4nn@reddit
So this does make webpages dependent on the Canvas API, which is a huge disadvantage.
LightShadow@reddit
I've implemented something similar on our website, albeit not this fancy and technical, and we had to make major adjustments to the MVP because the
<canvas>
API is inconsistent, slow, and resource intensive. It's also not reliably available as discussed in the blog article because it's unsafe.My solution was to pre-compress the data as PNGs and use the
<img>
tag to deconstruct the base64-encoded images.Ecksters@reddit
A really interesting application of this could definitely be compressing uploads in cases where you know they're going to be sending you highly compressible data.
tylian@reddit
Extremely cursed and extremely well done. I was reading this on my phone and had a suspicion that the page I was reading used the technique mentioned, but didn't have any idea it came into effect past a specific point, so the transition was seamless. I'd call that a win for an experiment, good job!
mr_birkenblatt@reddit
Why does adding noise prevent fingerprinting? If love to hear the reasoning behind this
scratchisthebest@reddit
Generally canvas fingerprinting is done by drawing some system-dependent stuff onto a canvas (hardware acceleration, 3d shapes, fonts, emojis &c) and hashing the pixels of the canvas. If the telemetry server sees 2 pageviews that computed the same canvas hash, it's a signal that the pageviews might be from the same browser. Adding noise means the hash will always be unique so it can't be used to correlate pageviews across visits in this way.
mr_birkenblatt@reddit
Thanks, wouldn't masking out the lower bits before hashing completely defeat the purpose of the noise?
MereInterest@reddit
Possibly, but it depends on the type of noise. Currently, it looks like it's a few low bits set on random pixels are changes, but there's nothing requiring that type of noise.
Hashing algorithm ignores the low bits on each pixel? The noise could return an adjacent pixel instead of altering the value of the current pixel.
Hashing algorithm averages over some region? The same noise to the low bits could be applied to all pixels in a small region. (This hashing would likely also defeat the point of the fingerprinting, since it would average out small differences in rendering engines that the hashing is trying to detect.)
It's a cat and mouse game, where unethical websites try to find more ways to spy on users, and browsers try to find more ways to stop them from doing so. If websites start adjusting the hash they use to fingerprint users, then browsers can and should update their protections to match the new thread.
DavidJCobb@reddit
For fonts and emojis, it seems like someone could work around this and still fingerprint users by drawing to an oversized canvas (say, 3x scale), pulling the image data into a plain array (so it gets fuzzed this one time), downscaling the data by hand to shrink the fuzz out of existence, and then hashing that.
birdbrainswagtrain@reddit
Is this correct? I was under the impression that these new-fangled compression algorithms were designed to prioritize speed just as much as size. I'm no expert, but most of the results of a quick search seem to contradict this.
Really neat article though.
ProgramTheWorld@reddit
Definitely a fun read. I’ve never thought about using an image to compress arbitrary data.
Perhaps a downside to working in the industry is that I kinda lost this creative thinking. A more practical solution would be to defer load content so that the 30KB vs 80KB difference becomes insignificant but that’s no fun at all.
jfedor@reddit
Did you benchmark actual page load times?
nicholashairs@reddit
Love me a good "just because you can doesn't mean you should but that didn't stop me".
MorbidAmbivalence@reddit
I do love the cursed and creative workarounds devs come up with. The bit about data randomization from canvas was a surprise. Super weird that some APIs are affected and not others.
agumonkey@reddit
Sweet out-of-the-box work. Kudos
usrlibshare@reddit
Sopel97@reddit
That's pretty clever. And I'm surprised by how good it ends up. How does it compare regarding decompression speed?
RoboticElfJedi@reddit
Fun read. Why I come to this sub.
narnach@reddit
I love it when people combine existing tools in novel ways. This is brilliant!