Box to save memory in Rust

[-]

wannaliveonmars@reddit

The more I read, the more it seems that C will remain the most efficient language for speed, for the time being. It's not so much the language itself, but it seems the presence of an npm-style package manager in any language leads to bloat in programs caused by too much reliance on third-party libraries.

I still can't imagine a plain C program that would use up 900MB of RAM unless it's a video game. They are still the only programs that can pack real functionality in a 100-200 KB executable.

[-]

MaybeADragon@reddit

Having a package manager doesn't increase memory usage of a compiled program, other than for the increased binary size that additional crates typically cause (which obviously is in the order of KB not MB).

[-]

wannaliveonmars@reddit

That's not what I meant. A package manager creates the social synamic of programmers using lots of packages, and of packages using other packages. For example the famous npm package with isStrintNullEmpty that's uses by millions of programs.

It makes programmers add massive chunks of code where they use only 5-6 lines of code from the package that they could have written themselves.

With C, adding dependencies is painful enough to make programmers more likely to avoid it. Having a package manager changes programmers behavior, similar to how cars cause people to walk less

[-]

max123246@reddit

Using battle tested libraries leads to better software. It's the Unix philosophy, do one thing and do it well

Do you know how many linked list implementations there are in the Linux Kernel? And how many of them should actually be dynamic arrays but it was easier to implement a linked list?

[-]

MaybeADragon@reddit

As the other user said, you pay only for what you use. It's not like every crate just runs Box::leak over and over, or is stuffed full of telemetry.

[-]

meowsqueak@reddit

Pretty sure the linker removes any unused code.

I’d rather add a line to a toml file and a few milliseconds to my link time than spend a day or two debugging memory leaks and rare segfaults.

Owning a car, I walk less, but I’m also able to get more done.

[-]

fishy150@reddit

i'd guess that the equivalent C program in both cases would still be using that much memory, where the original program is a struct full of a bunch of big unions and the new program stores pointers to those values instead

[-]

kexxty@reddit

My memory is gone Bono!

[-]

jdehesa@reddit

Nothing revolutionary, but good reminder that when you use Option (or std::optional in C++) you are still paying for the (non-heap) memory of the struct even if it is None.

[-]

pdpi@reddit

More generally, any enum costs you the width of the largest variant, plus the width of the discriminant.

[-]

Horusiath@reddit

Not necessarily - Rust has a ways to optimise Options depending on the case:

size_of::<Option<u64>>(); // 16 bytes
size_of::<Option<NonZeroU64>>(); // 16 bytes (0x00 used as `None`)
size_of::<std::io::Error>();  // 8 bytes
size_of::<std::io::Result<u64>>() // 16 bytes
size_of::<std::io::Result<Option<u64>>>() // also 16 bytes (Result and Option share discriminant value)

[-]

pdpi@reddit

Sure, and Option<Box<T>> is the classic case, where the representation is basically just the bare pointer. I was pointing out the general baseline rule, because these optimisations are still, IIRC, special-cased and not completely generic.

[-]

Lyvri@reddit

Box is just RAII wrapper around NonNull, which has non zero invariant, therefore Option>, always has the same size as Box

[-]

AresFowl44@reddit

I think they have been generic for a while now, no? They're definitely generic now.

[-]

valarauca14@reddit

This also appplies to user defined enums, if the compiler can prove no variant is zero

 #[repr(u8)]
 pub enum Idk {
     Foo = 1,
     Bar = 2,
     Baz = 3,
 }
 assert_eq!(size_of::<Option<Idk>>(), 1);

[-]

Successful-Money4995@reddit

How is that last one accomplished? Is it custom code to deal with that specific pattern?

[-]

encrypttwice04@reddit

it's called niche optimization, the compiler sees that some bit patterns are impossible for the inner type (like a null pointer for a reference) and uses those to represent None without extra space, it's automatic and kinda magical.

[-]

PthariensFlame@reddit

It’s a special case of the “niche-filling optimization” that rustc performs; here is a good summary that was posted here a while back: https://herecomesthemoon.net/pdfs/mond-how-many-options-fit-into-a-boolean.pdf

[-]

zzzthelastuser@reddit

16 bytes:

8 bytes for the u64 value
8 bytes to store Option and std::io::Result type.

For the Option you need just a single bit, the rest of the almost 8 bytes is enough to store the std::io::Result type. That's how I understand it. Please correct me if I'm wrong.

[-]

Successful-Money4995@reddit

It's surely 8 bytes for both error and u64 and the other 8 bytes is just the discriminant storing none, error, or ok.

Count the states. 2 to the 64 possible errors plus 2 to the 64 possible u64 plus a single state for a non error missing option. So the number of states is 2 to the 65 plus 1. Just barely 66 bits.

Rust is doing some magic to convince the discriminant for the error and for the option to live in the same word and that's the part that amazes me!

[-]

zzzthelastuser@reddit

Count the states. 2 to the 64 possible errors

I have only counted 41 error states (ignoring that each error state could carry additional information) and of course another state for the Ok result type.

I looked at https://doc.rust-lang.org/src/std/io/error.rs.html#458

[-]

Successful-Money4995@reddit

Either way, it's over 2 to the 64, so you have to use 128 bits in total. It's going to be a lot more convenient to encode it like I suggest than to make option none be 43. Also, if someone adds an error state, none needs to become 44 and it'll break stuff for reverse compatibility.

I would still encode it the way that I said. I'm guessing that Rust is doing it like I said.

[-]

Horusiath@reddit

From what I've managed to test, it also works on custom enums as long as their tag values are not defined explicitly. I guess the rust compiler rearranges and packs the tag values of nested enums if it's safe to do so.

[-]

encrypttwice04@reddit

and that’s exactly why the niche optimizations matter so much in practice, you just don’t see it until you look at the assembly

[-]

paulstelian97@reddit

Option> and Box are the same size because of niche optimizations.

[-]

Dean_Roddey@reddit

The main place this hit me is in my error system. I have a single error in my whole code base (the point being that errors are really errors in my system and not looked at and reacted to) and it needs to be pretty rich for good post-mortem.

I quickly realized that I was returning 90+ bytes just to say Ok(()), which wasn't good. So I changed it to Box its contents internally instead.

BTW, one thing is take a LOT of advantage of is static refs. Even though my error type is quite rich, often that one main boxing allocation is the only one involved. The source file is a static string ref. If the caller doesn't need to format values into his error text, that's a static string ref. The core error info (crate, error name, and short description) are all generated via my code generator so that's just one static refs to a generated struct for the given error. The call stack just needs a static string ref and line number per slot.

So it is still super-efficient relative to the amount of information it provides. The logging system uses the same type, so errors are trivially logged without modification and logging gets the same efficiency benefits.

[-]

Arcuru@reddit

I hit this a lot with my error types as well. I like to add lots of detailed info to my errors so there's enough info in it. Code Link

With every different module having it's own error type things quickly grow out of hand when one of the paths need to return a lot of data.

My solution was just to start Boxing every returned Error. The Error path is the uncommon paths anyways, so there's no point in trying to keep them inline.

[-]

matthieum@reddit

I must ask: why not Option<Box<str>> for the strings?

This costs 16 bytes, instead of 24 bytes, shaving off 1/3 of the footprint.

[-]

MaybeADragon@reddit

Maybe its for a crate? I personally avoid Box for anything I intend to publish because I have no idea if a user (of which i have 0) is going to want or need mutability.

[-]

apetranzilla@reddit

Box::<str>::into_string allows you to convert a boxed string slice into a String with no allocation or copying, so it's not too expensive for a user to convert, mutate, and box the string if necessary

[-]

MaybeADragon@reddit

I agree, but that's the sort of thing I personally believe a user should be opting into instead of forced into. I personally wouldn't mind Box as a default everywhere since I use it internally a lot, but a newbie who just came out of rustlings or the book would probably be put off.

Its a flavour thing really, as long as the API picks one and sticks to it its fine but I just think String and Vec are sane defaults so I stick to them.

[-]

GlowingBadger175@reddit

this is a really helpful tip for saving memory

[-]

ericonr@reddit

Besides the complexity of a profile specific to measuring memory usage, there's the fact that different allocators manage memory differently, and will get you different fragmentation and performance results.

At least on UNIX-like systems, you have access to getrusage, which will tell you memory statistics without impacting your runtime in any way.

https://man7.org/linux/man-pages/man2/getrusage.2.html

[-]

Tornado547@reddit

what if there were a string that stores its cap and len in the same allocation as the string data. getting an &str pointing to it would be trivial and the only problematic part would be having to have another copy of methods that take &mut String.

[-]

Tornado547@reddit

Option::::None woild he the same size of Option::>::None but an Option::::Some hsd the same amount of indirection and memory allocation and (is fragmentation still sth to care about in the world of infinite vaddr space?) as an Option::::Some.

am i cooking chat

[-]

KingOfTheTrailer@reddit

is fragmentation still sth to care about in the world of infinite vaddr space

I'd say so. A fragmented heap will consume some physical memory. A bad allocation pattern could in theory eventually consume all physical and swap memory in a long-running process.

[-]

vytah@reddit

There's the thin-string crate, which does exactly that.

And for thin immutable strings (so you can feel like programming in a higher-level language), there's the arcstr crate.

[-]

ng37779a@reddit

Nice article! Well written. Box isn't just about saving memory—it's about making indirection explicit where other languages hide it behind abstractions. Many avoid allocations, but the real skill is knowing when the cognitive overhead of avoiding Box outweighs the performance gain. Memory-first thinking works until you hit the debugger trying to track ownership across complex data structures.