How we found a bug in Go's arm64 compiler

[-]

razialx@reddit

Excellent write up. I don’t work in go myself but I feel like I learned a bit about it today.

[-]

wd40bomber7@reddit

Yeah, pure usermode thread preemption is very tricky to get right! I was impressed to learn its part of the Go runtime.

[-]

Ameisen@reddit

uses it cooperatively

There's also an advantage over preemptive multitasking that you have significantly less task switching, and thus avoid the overhead associated with that.

Some games still use fibers for that very reason.

[-]

I don't think that is a matter of implementation difficulty. Async/await has a ton of benefits just by its nature. One can argue green threads vs async/await until you are blue in the face, but the answer is: there's no right answer.

I prefer async await myself, having full control of suspension points allows me to do fancy things without arguing with the runtime. "race these, unless that, but cancel the whole thing if, but only unwind to here..."

surely somebody will say it is all possible in green threads. which is probably true. just doesn't work as well for me.

[-]

wyldstallionesquire@reddit

Green threads ARE cooperative. Ie, asynchronous await.

I’ve seen the debate happen back and forth, so it’s not universally accepted, I’ll admit.

[-]

cat_in_the_wall@reddit

green threads canonically are preemptive userspace scheduling. this is not cooperative. go sucks ass, so if they didnt do it at first, that is ass sucking. so i think you may have your history confused.

[-]

wyldstallionesquire@reddit

https://en.wikipedia.org/wiki/Green_thread

I know, I know, Wikipedia, but it’s enough to show there’s not a clear consensus

[-]

happyscrappy@reddit

I might call that a bug in the runtime. It's not illegal to do what the compiler did. It's just the runtime assumes that the SP always points at a valid frame.

Either way, since the compiler is paired with the runtime it's a big screw up.

I honestly didn't know having your runtime crawl the stack routinely was acceptable in this day and age. I know it's how GC used to work back in the day (1970s LISP machines, etc.) but nowadays things like BAT (big ass tables) seem like the way to go. See "zero-overhead" exceptions for example.

Any attack on the code (injection) which can alter the stack has a good chance of being able to create a DOS because of this. Just smash any part of any stack and then wait until the GC barfs on it.

[-]

knome@reddit

I know it's how GC used to work back in the day

As far as I am aware, any modern GC would have the same need to scan the thread stacks to determine roots as well.

things like BAT (big ass tables)

I can't find any technical references to any specific techniques or technologies referred to as "Big Ass Tables"

Certainly not in relation to GC.

See "zero-overhead" exceptions for example

What do zero overhead exceptions have to do with anything? Zero overhead exceptions just means deferring allocation and exception construction in such a way as a normally running thread won't have any slow down unless an exception is actually used. Are you referring to unwind tables that store stack-info to help efficiently unwind the stack and land in the right catch/except?

Any attack on the code (injection) which can alter the stack has a good chance of being able to create a DOS

any kind of stack-injection has a good chance of fucking up your runtime regardless of GC technique. your stack's invariants just got fucked. all bets are off for normal running. no language protects against screwing up so badly someone is spraying your stack with arbitrary data. there are mitigations, but you're already in trouble.

do you have any sort of links to references on any of these things?

at the moment, without having more information, this comment just reads like confident nonsense, but I am very willing to learn if I'm simply unaware of some new developments.

[-]

happyscrappy@reddit

As far as I am aware, any modern GC would have the same need to scan the thread stacks to determine roots as well.

Not all GCs are even mark and sweep.

I can't find any technical references to any specific techniques or technologies referred to as "Big Ass Tables"

You're a real pip.

What do zero overhead exceptions have to do with anything?

It was an example. How you use big ass tables to mark down the information relating to a context instead trying to reconstruct state by scanning execution state.

With garbage collection instead of just keeping all your information hidden within the runtime stacks as a side effect of using the data you can do things like reference counting. You could even put active items in a separate table (a big ass one) and remove them when no longer in use. In this way you don't have to crawl your stacks to find out what is active.

any kind of stack-injection has a good chance of fucking up your runtime regardless of GC technique

Yes. Once you can alter the stack chances are good you can create a DOS regardless. This can make it easier, if you have cases where the values (or reaches of offsets) are limited to you. But ultimately, it's generally just making the job easier, not bringing it from impossible to possible. It might help to you obfuscate the code some more too as we see here if the crashing code is less proximate it becomes harder to even figure out what is going wrong so you can getting to fix it.

at the moment, without having more information, this comment just reads like confident nonsense, but I am very willing to learn if I'm simply unaware of some new developments.

Check out the badass here.

[-]

knome@reddit

Not all GCs are even mark and sweep.

reference counting is slow, requiring writes for every object referenced and dereferenced. not to mention it's broken as soon as objects form a reference loop.

You're a real pip.

if you use and define an acronym as if it's an actual technical term, you shouldn't be surprised when someone assumes you were referencing an actual technical term. "Big Ass Tables" would hardly be unusual as such a term if it actually was one.

How you use big ass tables to mark down the information relating to a context instead trying to reconstruct state by scanning execution state.

It's irrelevant.

You still have to walk the stack to get the actual object references, even if the shape of the stack is outlined in metadata. It's very common for that to be the case. Most stack-scanning is precise in modern GC, so that instead of having to guess whether a value is an integer vs a pointer, the GC can be confident of the data type. Imprecise GC was a questionable technique previously used to avoid having to build stack-frame knowledge or metadata-tables into the GC, at the risk of integer/pointer collisions causing hanging objects from false roots.

With garbage collection instead of just keeping all your information hidden within the runtime stacks as a side effect of using the data you can do things like reference counting

Again, this means updating some off-stack data structure every time you bring in or release a pointer on the stack. you've doubled or tripled your work, and will cause constant memory ownership shuffling if you've got multiple threads that can access and reference the same objects as the chips coordinate to update either the objects or your big tables of object-pointer-to-reference-count data.

You could even put active items in a separate table (a big ass one) and remove them when no longer in use.

separate table from what? the table with your dead/unreachable objects? if you can tell the difference, you'd just collect them. you can't in a mark-and-sweep, because release just means you stopped using a pointer or overwrote it, and you don't have the information available at that point to determine if the object is active or not. in a reference-counting system, you could, but a reference loop would mean your data is never collected at all and remains 'active' indefinitely after your program stops referencing it.

In this way you don't have to crawl your stacks to find out what is active.

so you'd instead just have to keep all that data in a separate stack of references, which is silly, because your thread's stack is a perfectly fine data structure to extract references from without needing to double up your writes everywhere.

any kind of stack-injection has a good chance of fucking up your runtime regardless of GC technique Yes. [snip] so you can fix it.

the only real mitigation I'm aware of for buffer overflows, aside from using a proper language that makes sure they can't happen, is address space layout randomization (ASLR), which can be used to avoid having well known attack points for overflows to try to overwrite your return address with, making exploitation more difficult

Check out the badass here.

If you had been referencing some new technique for optimizing GC, I would be interested. It wouldn't be the first or last time I happen across references to unfamiliar tech.

however, it seems you were, indeed, just making shit up.

[-]

happyscrappy@reddit

reference counting is slow, requiring writes for every object referenced and dereferenced. not to mention it's broken as soon as objects form a reference loop.

Dereferenced is a bad choice of word here, as it conflicts with how you access objects. Probably say "go out of scope".

And how is relevant? You said you must do such and such. Now you just say other possibilities are slow. Are you trying to create your own goalposts?

"Big Ass Tables" would hardly be unusual as such a term if it actually was one.

You're a real pip.

You still have to walk the stack to get the actual object references

No. That's not at all true. You can write down references somewhere else.

You could even put active items in a separate table (a big ass one) and remove them when no longer in use.

Now you're catching on. After telling me I'm wrong and you have to walk the stack now you say the same thing to me I said to you.

so you'd instead just have to keep all that data in a separate stack of references

It doesn't have to be a stack. It can be other kinds of big ass tables. There's more than one kind of data structure.

which is silly, because your thread's stack is a perfectly fine data structure to extract references from without needing to double up your writes everywhere.

No, it's not perfectly fine. It's full of all kinds of other information that isn't relevant. That makes it larger. It also is spread out across more memory, which isn't great in a locality-of-use way.

without needing to double up your writes everywhere.

You don't have to double anything up. If you take it out of the call stack you can put it elsewhere and only there.

address space layout randomization (ASLR), which can be used to avoid having well known attack points for overflows to try to overwrite your return address with, making exploitation more difficult

Not that important when trying just to create a DOS, right? You don't need to make valid pointers and you don't have to write over return addresses. ASLR is not a very good method anymore. It means the values aren't static, but they are pretty easily determinable. And it isn't really that much about hiding the locations on the stack (return addresses), since there are so many things on the stack which point back to the stack. Hell, the SP points to the stack!

ASLR is really more about DATA and TEXT sections, global variables and code.

Anyway, it didn't save Jamal Jhashoggi:

https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html

It slows down attackers but all current exploit chains (and there are a lot of them) work around it finding addresses and using offsets from them. Then when the stuff moves around it still finds it. There are other ways to beat it too.

however, it seems you were, indeed, just making shit up.

Right. You assumed that all GC is mark and sweep and I'm making stuff up.

I say to you can store it elsewhere in tables and you say I'm wrong then say you could store it elsewhere and I'm making stuff up.

Come on, you sound ridiculous.

I'll more fill out what I meant by referencing exceptions.

When exceptions were new, they were just setjmp()/longjmp(). You saved every CPU register just in case you might take an exception and then restored them if you did. This was perfectly fine, the person to first use them coded them up and then went on to see how they could be used. No use spending a lot of time gilding the lily before you know there is a lot of payoff for doing it. Once you realize they are useful you start to refine how they are implemented. Once you know there are millions of people using them there is a huge advantage to optimizing the implementation greatly. And that's how exceptions progressed.

They went from saving everything on the stack just in case to saving only the active registers. Then they changed the compiler to know which registers are changed between a try and catch and only save those. Then they changed the compiler to try to minimize the amount of stuff that is changed so that less stuff has to be changed. They even went to an idea of getting all the stuff that changes out of the register file (to the stack) so you didn't have to save it. Others pointed out this was false economy since it's basically doing all the saves just in case again. As this was done, the method of catching got a lot more complicated. And so they created a "bag on the side", a bag of data used to clean up on a catch. Instead of putting all that on the stack. This data is indexed by the IP (PC) where the exception is taken, because it'll be the same each time even though the state on the stack will change. So the tables could be static and so you didn't need to make a lot of copies of them on the stack.

All this was putting a lot more effort into improving exception implementation because it was clear a lot of people were using them. It is, as its core, the same idea as making your compiler better. Sure, it may take 30 hours to make an improvement, but if you have millions of users of the compiler it pays off.

So this shows how a technology can be improved once it is clear how important it is. And for you to suggest that the state of the art in garbage collection is 50 years old, the mark and sweep used by those 1970s LISP and Smalltalk machines, well, it doesn't really make any sense. I'm not throwing shade on those who developed GC back then. But if it moved into something used by a billion people on the planet and no one invested time in making it more efficient by not walking every stack in the system? Well, that doesn't make sense to me.

As an aside, I don't really accept the idea that reference counting is slow. Much as with exceptions you get the compiler involved and it has a lot of smarts. It can make the counting up and down a lot more efficient than a naive implementation is. It can remove reference counting completely for things which are not shared (despite being marked as sharable). As you can see by this post. And all these values are relatively local to the other accesses, meaning they don't blow out your cache unlike a sweep of every frame in every stack in your task.

But ultimately I leave that up to others to investigate. I don't spend a lot of time benchmarking mark and sweep or reference counting. I only recently even acknowledged that something of this sort was necessary, that you really can't erase use after free ever with explicit memory management. I see that as a problem but haven't found any sweep methods which I find to be suitable. Every one of them seems to try to make itself look better by simply leaving more garbage around. The less often you sweep you less slowdown you have. At least slowdown measured in task execution time. Whether you are losing out due to the deleterious effects of memory overcommit isn't quite as well defined. And then there are systems which don't even have memory overcommit (small embedded systems) and so are not well suited to these types of GC.

[-]

knome@reddit

Dereferenced is a bad choice of word here

sure, dereference is overloaded from grabbing an object at the end of a pointer. we can use ref/unref or copy the linux kernel and use get/put. fine.

Probably say "go out of scope"

this is bad in a different way. overwriting values in collections or overwriting variables with new values don't close any kind of scope, yet still deref the object and potentially prune it the object from the reachability tree.

ASLR is really more about DATA and TEXT sections, global variables and code

its entire purpose is to make it harder for a hijacked stack to jump to well-known functions in executables. it has no other purpose.

Now you're catching on

har, har, I missed a >

You can write down references somewhere else

you know what screams efficiency? writing out an object pointer to a second data structure in addition to incrementing the reference count on the object every time the stack uses an object.

And for you to suggest that the state of the art in garbage collection is 50 years old, the mark and sweep used by those 1970s LISP and Smalltalk machines, well, it doesn't really make any sense

1970s GCs were stop-the-world and sometimes copying to avoid memory fragmentation (though it halved your memory since that period used two equal sized memory arenas while copying objects around). there are modern mark-and-sweep collectors that collect concurrently with program use, use multiple gc memory arenas to reduce time spent on collection, even some that promise sub-5ms stops during collections, keeping runtime steady and predictable.

you find me a mark-and-sweep GC for a modern language that doesn't read the stack for identifying roots and we can discuss the merit of your ideas here.

You assumed that all GC is mark and sweep and I'm making stuff up

the only modern language using reference counting as its basis is swift. popular languages using reference counting include python, PHP and perl, all of which were created in the late eighties through mid-nineties, and are slower than molasses in deep winter. they use reference counting because it was thirty years ago and writing a tracing GC was basically black magic without modern ease of access to computer science materials. reference counting is a lot easier for a mostly offline dev to work with (all had access to the web, but resources available were far less than today). additionally, python includes a half-baked tracer because early 2.* series python interpreters would get absolutely trashed by ref cycles eating all their memory. looking into it, apparently, PHP has one as well. perl takes it on the nose. as does swift. both will allow the user to create ref cycles that just sit around eating up memory indefinitely afterwards.

I assumed you were making stuff up because you claimed that modern GCs don't scan the stack, which is a ridiculous claim.

trying to say you were talking about reference counting is nonsensical, because no one would even mention anything about stack-scanning in a discussion where they were thinking about reference-counting. it's not part of it.

reference counting isn't without its uses. hell erlang, a language whose multitudinous greenthread stacks can't form reference cycles, but still use mark-and-sweep compaction to dispose of garbage, uses reference counting to share large binary objects between threads. this is useful to avoid compacting large blobs, copying large blobs to move them between threads, and because blobs can't reference anything, so the references are always one way. it can't be part of a cycle because it doesn't reference anything.

If you take it out of the call stack you can put it elsewhere and only there

if you take all the values out of your call stack, whatever you're sticking them in, that holds all of your frame data, is a call stack. just done by hand and worse for no reason. what, you keep pointers to your big table on the stack itself? what happens when you copy them or move them around? are you effectively two-tiering your reference counting so you have stack->stack-refs->actual-objects? it might be quicker in some cases by localizing most RC tweaks, but it would still be constant bit flipping in there to juggle liveness references from the stack. every time you reference something new you'd have to update your mid-table, and the actual-object, and you're now dealing with what amounts to a reference-indirection-cache, with all the additional problems that a cache will introduce (cache growth, compaction, value ejection (immediate vs keeping some kind of LRU queue live to avoid rapidly pinging actual-objects). walking an iterable would require you to create a ref to the actual-object, a mid-tier object, and reference that from the code (which now has to know how to track mid-tier objects and keep them updated and trigger actual-object collection when they die of the actual-object also dies) for each item in the iterable, every time you iterate it.

Right. You assumed that all GC is mark and sweep and I'm making stuff up

yes, if someone starts talking about modern GC I'll assume they're talking about some variant of mark-and-sweep, because that is the norm.

[-]

happyscrappy@reddit

this is bad in a different way. overwriting values in collections or overwriting variables with new values don't close any kind of scope, yet still deref the object and potentially prune it the object from the reachability tree.

A compiler writer will definitely tell you otherwise. Did you know modern compilers often model a variable being overwritten as the previous variable going out of scope (end of life) and a new one being made? The old variable value is no longer accessible. It's out of scope.

its entire purpose is to make it harder for a hijacked stack to jump to well-known functions in executables. it has no other purpose.

Absolutely not. DATA sections move too. Stacks move.

'ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.'

https://en.wikipedia.org/wiki/Address_space_layout_randomization

you know what screams efficiency? writing out an object pointer to a second data structure in addition to incrementing the reference count on the object every time the stack uses an object.

Why would I do both of those things? You now twice have created situations where extra work is done out of nothing.

every time the stack uses an object

That's not how it works. It only happens when the object enters scope or exits it. You can use it as many times as you want between those.

We've got a basic issue here. You don't understand what you are talking about. This seems like an intractable problem.

you find me a mark-and-sweep GC for a modern language that doesn't read the stack for identifying roots and we can discuss the merit of your ideas here.

Not relevant. I didn't confine myself to mark-and-sweep. You confined yourself to mark and sweep. And you really didn't think this idea was so dumb when you were suggesting it:

You could even put active items in a separate table (a big ass one) and remove them when no longer in use.

But when I suggest it you're litmus testing me and ridiculing it. You're being dishonest.

the only modern language using reference counting as its basis is swift

Rust uses a system of tracking akin to reference counting. And it does it in the compiler instead of the runtime. Similar to how I described. It does not use mark-and-sweep.

I assumed you were making stuff up because you claimed that modern GCs don't scan the stack, which is a ridiculous claim.

That's not what I said. I said I couldn't see how a GC that scans stacks (there's not just one stack) is modern. I didn't say there weren't systems in use that work that way.

trying to say you were talking about reference counting is nonsensical, because no one would even mention anything about stack-scanning in a discussion where they were thinking about reference-counting. it's not part of it.

Again, I don't confine myself to mark-and-sweep. If you think it's nonsense then it's because you only have a hammer and you think everything is mark-and-sweep. If you take a step back you might see that I'm saying I think mark-and-sweep by scanning stack frames is outdated. Right or wrong, this is not somehow inconsistent or nonsensical as you are trying to make it out to be.

if you take all the values out of your call stack, whatever you're sticking them in, that holds all of your frame data, is a call stack.

No. A call stack has the return address of functions on it. This doesn't have those so it isn't a call stack. In addition, you are again pretending I said that I would put this stuff in a stack. I never said any such thing. It is you that is stuck on stacks. I said there are other kinds of data structures.

And I'm tired of saying it. I'm tired of you trying to shove everything I say through your mark-and-sweep lens. You screwed up assuming that mark-and-sweep is the only way to do things. You screwed up assuming the only data structure is a stack. And I'm exasperated with your failures on these fronts. I'm done.

[-]

knome@reddit

Did you know modern compilers often model a variable being overwritten as the previous variable going out of scope

Yes, I'm familiar with SSA. The internal representation used during compilation doesn't make your statement not silly.

Absolutely not. DATA sections move too. Stacks move

Yes. They all move. Why do they move? to make it so payloads in stack smashing attacks can't find anywhere to gain purchase.

You now twice have created situations where extra work is done out of nothing

You have maintained you'll keep references off the stack by keeping them "somewhere else". It's a massive handwave. Your "big ass tables" don't mean anything. If you can cogently describe what you mean, I'll be all about it. Right now, you're giving me nothing.

It only happens when the object enters scope or exits it

that would be using the value, yes. what other meaning could it have? every time the name is mentioned? that's an exceedingly uncharitable reading of what I wrote to the point of purposeful misinterpretation.

for most languages, usage would be whenever the stack local refs a value from a struct or collection-type (or global). even stack-local to stack-local unless the compiler is smart enough to perform reference count liveness tracing on the stack and maintain only one ref while allowing multiple local references to exist, essentially batching their liveness together into a single ref. I'd wager none of python, PHP or perl pull off that trick. swift probably does.

And you really didn't think this idea was so dumb when you were suggesting it

I was quoting you and missed typing the > in order to make it a quote. which I pointed out the first time you pretended I was repeating you.

Rust uses a system of tracking akin to reference counting.

arc (atomic reference counted) is just a scope guard type like a unique or shared ptr in C++, but with support for rust lifetimes. rust uses manual memory management, but supports RAII-like patterns using types that trigger heap collection/dereference. I wouldn't consider Rust to have GC any more than I consider C++ to.

(yes, I'm aware of the Boehm collector, and no, 99% of C++ doesn't use it. probably add a few 9's there.)

And it does it in the compiler instead of the runtime

that's not how any of this works. if you're going to pretend Rust folding constants etc is some kind of compile-time GC blather, please don't.

That's not what I said. I said I couldn't see how a GC that scans stacks (there's not just one stack) is modern

asserting that a technique isn't modern is the same thing as asserting modern things don't use that technique.

all modern mark-and-sweep GCs use stack analysis. refcounting languages obviously don't, since there is no scanning involved in them, but refcounted languages are, with a single exception in swift, not modern. the three popular refcounted languages I mentioned are 30+ years old, and still using the same memory semantics because they've stayed backwards compatible over time, and ripping out a refcounted engine for a tracing one would be a huge pain in the ass that would break everything that interfaces with their engines at a low level.

what you said is "I honestly didn't know having your runtime crawl the stack routinely was acceptable in this day and age".

yes, it's acceptable, common and nearly universal for modern GCs.

Again, I don't confine myself to mark-and-sweep

which is why I'm being careful to note when I'm discussing each type, and the performance and implementation issues thereof.

it's great that you think 1970s style reference counting is the bees knees, but it's a wildly outmoded way to keep track of memory in a modern general purpose managed runtime.

No. A call stack has the return address of functions on it. This doesn't have those so it isn't a call stack

alright. fair enough.

so you've invented a 'frame-stack' that tracks everything that would be on the callstack but just off to the left of it instead, leaving only frame-stack-pointers, return-addrs and a bit of metadata on the genuine callstack. every new frame is now two bumps instead of one, and unwinding the stack means continuously following the pointers from the callstack over to the frame-stack to do what would otherwise have been done using local offsets on the stack itself. this also means you're not checking values dumped from registers from the stopped thread during collection too, so it's impossible to have your program just juggle values purely in the registers, and you'll have to dump them into the frame-stack to make sure they get cleaned up during an unwind.

what's the purpose of adding this pointless indirection?

let's see what you had to say when I said the stack was fine, "No, it's not perfectly fine. It's full of all kinds of other information that isn't relevant. That makes it larger. It also is spread out across more memory, which isn't great in a locality-of-use way"

ah, right. the stack is full of other information. all that other information will still be in your stack. what information is in the frame? your locals. overflow arguments if you have too many. register values it needs to reset before returning. there's some dwarf data to use in the lookup table for frame layout/unwinding. most of this will need to be copied into your frame-stack so the values can be scanned there instead of on the stack itself.

as for making it larger, all that information still needs to live on the stack. moving the data to a different location and adding a pointer over to it for each frame is larger than just having it on the stack.

you're just pointlessly increasing the work the program needs to do.

In addition, you are again pretending I said that I would put this stuff in a stack

fine. it's not in a stack. its in a big old table. how do you look it up in the table? what kind of index does the table use? does the table hold frames? are they typed or unioned? how does any of your not-a-stack work to hold the frame data from your very-much-a-stack callstack? is this just a reinvention of heap-frames because stack hard?

I'm done

I'd be done too if all I had was handwaving and namecalling to back up my opinions. You've been pointlessly rude this entire exchange.

[-]

Adakantor@reddit

Isn’t Go pretty famous for ignoring modern language design?

[-]

Maybe-monad@reddit

Not ignoring, rediscovering

[-]

satansprinter@reddit

I would say redefining but yeah

[-]

Schmittfried@reddit

I would say you drank the koolaid.

[-]

satansprinter@reddit

Not really, i dont even write go in my day job. But the way their concurrency works is by far the best

[-]

crozone@reddit

Yes, because it wasn't designed to be used by experienced programmers who can make effective use of many of the wonderful language features developed over the past 30 years. Those were intentionally thrown away in the name of simplicity.

Instead, it was explicitly designed to be used by short term contractors who need to pick up a project, shit out a feature, and then move on as quickly as possible. The language needed to be simple to a fault, and good for churning out low to medium quality code that can be easily tested and thrown into production. Whatever feature the code implements doesn't need to live very long because Google will probably shelve the project before the code quality becomes relevant.

[-]

case-o-nuts@reddit

For chosing to opt out of it. Several of the author's previous languages had, for example, implemented generics.

[-]

stumblinbear@reddit

Wild to me someone looked at this guy's language and thought "yeah they'd be a good hire for a new one we're making"

[-]

afl_ext@reddit

Have as much money as google and you can make any bad decision you would ever like

[-]

jug6ernaut@reddit

Me when I learned that Golang is a newer language than Rust despite feeling 20+ years older.

[-]

nexxai@reddit

This felt like conclusive evidence that our problem was a runtime bug.

From the article

[-]

PenlessScribe@reddit

I'm used to seeing the SP saved at a fixed offset from *FP, so at runtime you don't need to keep track of space added to the stack frame due to C VLAs or alloca(), you just move the saved SP to SP before returning. Does ARM mandate doing things differently?

[-]

IntelligentNotice386@reddit

The problem here is that for large offsets ARM can't encode the SP subtraction in an immediate, so it splits it into two instructions, and if the program gets paused after the first instruction but before the second one, the SP will be invalid.

[-]

bitfieldconsulting@reddit

This is actually a great way to learn a little about how the scheduler and runtime work—by seeing what happens when they don't!

[-]

CorgixAI@reddit

Debugging low-level runtime issues always highlights just how complex modern software stacks truly are. The methodical approach to isolating the bug—in particular, brainstorming root causes and synthetically reproducing edge cases—feels like a textbook example for tricky multi-threaded bugs. It's a reminder that even well-tested systems can hide deep, subtle problems, especially when compilers and runtime interact so closely. Kudos to the team for their persistence and for sharing the process!

[-]

Coffee_Ops@reddit

Is it just me or is the writing style really weird? It feels like they reused a number of phrases an awful lot:

It was clear that...

felt pretty confident that...

remote from the [ root cause | actual [ bug | crash ]]

And the story seems to loop back on itself-- they found a reported bug, that they were pretty confident was related, but there was nothing there they didn't know. So they did some more testing (what?) which was unsuccessful, but recalling that theres a bug, it might be a runtime bug, but then again it might not, and by the way remember that netlink is involved, and the crash is remote from the root cause....

It feels like either a whole lot of detail was edited out, or they needed to pad the content, or it was partially authored by LLM.

Or am I just being nitpicky here?

[-]

BinaryRockStar@reddit

Its the story of investigating a multi-threaded race condition resulting in stack corruption, (IMO) one of the worst class of bugs to investigate and hardest to solve. It isn't deterministically reproducible and the evidence you are working with often doesn't contain the information you need to find the root cause.

Due to this, the nature of the investigation is brainstorming all possible causes ranked by likelihood then trying to exacerbate each postulated cause by synthetic means such as writing a small program that is more likely to cause the failure due to its exaggerated use of, say, stack size, memory allocation, thread creation or garbage collection. Each postulated cause is looked in to and either yields further evidence or comes to a dead end, which in itself is useful.

So I can excuse the re-use of language detailing the issue "remote from the root cause", spelling out the most likely cause "felt confident that" and discussing the facts gleaned from testing "it was clear that". Having said that, I think you're being nitpicky as it's a reasonably in-depth blog post and I only see

It was clear that

1 usage

felt pretty confident that

2 usages

remote from the [ root cause | actual [ bug | crash ]]

1 usage of each of those alternations

Not particularly repetitive in my opinion and without

💪 An unprofessional number — of emojis
🚀 Far too many — bullet points
⭐ Constant bolded — first words of a bullet point list
🏈 Em-dashes — as far as the eye can see

I don't think this is AI generated, in fact it feel entirely wetware produced.

[-]

Coffee_Ops@reddit

All fair points, and I do see that the listed author has a background that makes it likely they were involved in troubleshooting this.

It's very possible that their writing style is a little less polished than their engineering and root cause analysis skills, and that's totally forgivable. And I do suspect that there was some necessary editing that may have led to the stilted style.

[-]

jamzex@reddit

Given that Cloudflare blog is more of a collection of its engineers personally writing stuff up, I prefer the slightly longer, more interested style of writing the people who worked on the issue have. Toning it down would likely make this less of an interesting read even if it was more efficient.

[-]

Am I just being nitpicky here?

I don't think so. This is becoming common and reading blogs lately is exhausting as a result.