Is Memory64 actually worth using?

[-]

wretcheddawn@reddit

Im certainly no expert on WASM, but the os already detects out of bounds memory accesses, is it possible to rely on the existing checks?

It also sounds like they are remapping the memory in software already. How is that not more of a performance hit than the length check?

[-]

C5H5N5O@reddit

Im certainly no expert on WASM, but the os already detects out of bounds memory accesses, is it possible to rely on the existing checks?

That's not the actual issue. The core issue is isolation. If you don't bound memory accesses to just the wasm module's heap/memory you can technically access any currently mapped memory (e.g. the process's stack, heap, etc.).

[-]

wretcheddawn@reddit

Wouldn't that also be a problem in 32bit?

[-]

tesfabpel@reddit

what if they use a "zygote" (a la Android) process that gets forked for each wasm module and the jitted code is inserted there, allowing the OS to trap OOB memory accesses?

the zygote part would allow to have a common IPC code to work with the browser's runtime...

in Windows, they may have to do something similar since IDK if there's fork there...

[-]

Qweesdy@reddit

The OS doesn't/cannot reliably detect out of bounds memory accesses. For example, let's say you have a 1 MiB array, but the index is wrong causing a read to be past the end of the array. "Past the end of the array" might be some other data (or code, or a shared library, or anything else) and the CPU won't detect that anything is wrong at all because that memory is still valid (for a different purpose), so the OS won't be informed that anything is wrong, so the OS is literally incapable of doing anything about it.

[-]

190n@reddit

It also sounds like they are remapping the memory in software already.

With 32-bit WASM pointers, the only remapping that's necessary is one addition, to add the WASM pointer to the base address where the WASM memory starts in the host address space. This has a cost but it's completely trivial compared to a branch checking if the pointer is in-bounds. Simple integer arithmetic is far cheaper than branching on modern CPUs.

[-]

190n@reddit

Good article.

Furthermore, the WebAssembly JS API constrains memories to a maximum size of 16GB.

What is the reason for this limit?

[-]

badpotato@reddit

If each tab of chrome start using more than 16GB it could be problematic for the end user... I think there should be a permission system when a tab start using too much memory

[-]

umtala@reddit

By reserving 4GB of memory for all 32-bit WebAssembly modules, it is impossible to go out of bounds. The largest possible pointer value, 2^32-1, will simply land inside the reserved region of memory and trap. This means that, when running 32-bit wasm on a 64-bit system, we can omit all bounds checks entirely

This optimization is impossible for Memory64.

Furthermore, the WebAssembly JS API constrains memories to a maximum size of 16GB.

Can they not just mask the pointer with 0x3ffffffff on access?

[-]

Uristqwerty@reddit

Unless each WASM sandbox is running in its own process and can somehow claim the entire <4G address space as an unbroken block, without any pesky non-relocatable DLLs inserting themselves there, etc., it would need to add a heap-start offset after masking the pointer

Works out fine, though. As far as I'm aware, current architectures tend to automatically zero-extend 32-bit values when storing them in 64-bit registers, so the mask can be entirely implicit, a side effect of the previous instruction.

[-]

monocasa@reddit

Masking every dirty pointer is a form of a bounds check.

[-]

umtala@reddit

For me "bounds check" means a branch. An extra bitwise AND before the offset access is essentially free.

[-]

monocasa@reddit

In a lot of cases an extra alu op and a branch that's well predicted (which a bounds check should be) will basically be the same cost.

[-]

umtala@reddit

We're talking about "pointers" but they are pointers in the WASM sandbox, i.e. offsets into a WASM memory object, not pointers into the process address space.

In the 32-bit case:

*(memoryObject + offset)

In the 64-bit (34-bit?) case:

*(memoryObject + (offset & MASK))

[-]

monocasa@reddit

I mean, the extra data dependency is visible there. You can't schedule the addition until the and has completed. A test and branch could be happening in parallel.

[-]

Qweesdy@reddit

The purpose of a bounds check is to detect when the pointer is wrong. Failing to detect that the pointer is wrong because it wrapped or was masked is a failure to bother doing any bounds checking. It's the opposite of a bounds check, it's a "bounds uncheck".

[-]

evilpies@reddit

Unless I am missing something, this forces all access to be in bounds, but WASM actually wants to trap on OOB.

[-]

umtala@reddit

Seems like it should be an option if trapping is so much more expensive. I'm using Rust so I don't care about it trapping, I'll take the full performance please.

[-]

Ronin-s_Spirit@reddit

Why? I thought WASM was basically a solid array buffer, in that case, having a big enough buffer to use 64 bit pointers without choking RAM sounds unlikely.

[-]

New_Enthusiasm9053@reddit

32 bits can do 4GB which isn't all that much when it's also intended as a cross-platform distribution method. Anything with a wasm compiler, which is simple to build by design would be able to run it. We already have CPUs with 1GB of L3 cache, not moving to 64 bits in the next few years will cause problems in the immediate future.

I don't think the contiguous block stuff matters, for performance maybe but every process gets a virtual memory space that is contiguous anyway and is handled by the OS internally, not all your pages are contiguous to begin with even if they appear to be. If your page isn't loaded it triggers a page fault and the OS loads in the page on any freely available page. Similarly it'll remove pages if it needs too onto disk if it needs the memory elsewhere.

That's how I understand it to work, people who know better can hopefully illuminate this further.

[-]

elmuerte@reddit

4GB which isn't all that much

That makes me sad to hear.

[-]

New_Enthusiasm9053@reddit

Even if the program code is 2MB the user data can be any size. A web based excel for example wouldn't want to arbitrarily limit itself to mere 4 billion cells. That's only 4 million rows * 1000 column which is pretty easy to exceed by the idiots who use excel as a database.

Alternatively a web based video editor or game will easily need more than 4 GB even if they're optimally efficient in terms of memory layout.

4GB isn't much in many, many contexts and wasm is intended to serve all possible applications on the web.

[-]

elmuerte@reddit

That makes me even sadder to head.

[-]

New_Enthusiasm9053@reddit

I mean ok if solving problems for people makes you sad then you're in the wrong field.

[-]

elmuerte@reddit

People have a problem running wasteful software. 4GiB of memory is an enormous amount of memory. It is not enough for every possible workload you can image. But calling is "not all that much" is just terrible. Sure, throw away all all devices with only 8GiB of RAM (or less) as this single app wants to burn through 4GiB of RAM because the developer thinks everything should be constantly in memory and can't be bothered to optimize the application the slightest because it was developed on a 20 core system with 64GiB of RAM and it ran ok.

This is the kind of mentality where the kinds of MS Teams developers are proud that their new and improved chat client only take 3 seconds to switch between chats.

[-]

New_Enthusiasm9053@reddit

Mate, if there's 6GBs of User data then keeping it in memory is fine. You could write excel to only load the data that it needs sure. But you can't write a game that way because the latency is too high. It's not WASMs job to restrict the developer and wasteful code can be written anyway. Not having 64 bit support actively blocks the development of highly optimized software that just does complex stuff in real time. WASM is meant to be a pseudo-assembly and we moved away from 32 bits over a decade ago for good reason.

4GB is only enormous if you restrict yourself to tasks that don't need a lot of memory.

I personally write efficient code but if I can make the users life better by using memory then I will. Everything has a spacetime complexity. Sometimes you trade time for space and sometimes space for time.

Either way it's not WASMs job to tell the developer what tradeoff to make.

[-]

Chisignal@reddit

4GB is obviously pretty obscene in the context of websites as hypertext documents, but keep in mind that WASM is, as its name suggests, quite literally assembly (for the web). It's intended precisely to serve those applications that are rich, complex and demanding, like movie, photo editors or IDEs. It's more akin to native applications being limited to 4GB which would be pretty absurd.

[-]

simonask_@reddit

32 bits can address 4 GiB of memory (minus one byte).

The reason you may want a larger address space is not to use it as an allocation heap, but rather to do interesting things like memory mapping.

[-]

Ronin-s_Spirit@reddit

Right, I forgot alignment and counted bitwise, silly me.

[-]

simonask_@reddit

So it makes sense that exposing a full 64 bits of address space would not be great, but a 64 bit pointer would still be required to represent other interesting virtual address space sizes, like 34 bits (16 GiB), or similar.

You could still do bounds checking via hardware traps with such an address space, even though it would require 64-bit pointers, no?

[-]

Peanutbutter_Warrior@reddit

No. If you've got a 32 bit pointer then there is no value you can give that pointer which can address more than 4 GiB. If you've got a 64 bit pointer, even if it's supposed to only be 34 bits, there's nothing stopping you making a pointer which is more than 34 bits.

[-]

david@reddit

The compiler could emit an AND on the pointer to wrap it to 34 bits before every dereference. Performancewise that might be between 32 bit mode and full bounds checking since it doesn’t kill the branch predictor.

[-]

Ok-Scheme-913@reddit

That would have basically zero performance overhead, the worst effect would be the extra code size. CPUs have a very large window for arithmetic operations, adding more will still finish way earlier than what it takes for a memory load to finish.

But it could also be added at the creation of pointer values, not at deref (since the compiler can track reference taking/casts from ints).