One Method Was Using 71% of CPU. Here's the Flame Graph.

[-]

LevelIndependent672@reddit

ngl 71% on one method is actually kinda satisfying lol. way better than when its spread across 20 different places and you dont even know where to start

[-]

Ameisen@reddit

The "death by a thousand cuts" issue is what happens even people take the "premature optimization" quote to an extreme.

Yeah, if you completely ignore best performance practices, those branch misses, cache misses, false shares, etc start to add up.

[-]

Matthew94@reddit

and almost nobody uses __restrict

For one thing, it's not part of the C++ standard.

[-]

Ameisen@reddit

Every mainstream compiler supports it as an extension in C++.

People use extensions all the time - often without realizing it.

[-]

valarauca14@reddit

__restrict can only be applied to a pointer/reference type but applies to the access through that indirection rather than to the object itself).

While you're technically correct, the best kind of correct. What you take with issue with is a logical side effect of having a restrict pointer within your block scope.

How can you access the underlying type/object/buffer without violating the definition of restrict?

If you can modify an L-Value two different ways in the same block scope; one is restrict and the other isn't, making 1 pointer restrict was UB.

[-]

Ameisen@reddit

In an older comment on an older post, I specify that __restrict has the opposite logical constraints of const - you can often safely remove __restrict, but you cannot safely add it.

In the case of two variables within the same scope, though, well, yes - that becomes more problematic. Don't do that.

In terms of calling member functions, though, it's a non-issue either way - you should be able to call any member function as no constraints are violated.

[-]

I wish I could think about low level optimization at work. The average B2B/Enterprise Software is slow because of very high-level stupidities like DB queries or service calls in a loop (n+1 query problem), thanks to ORMs and microservices. I've also seen a lot of parallelized code, workers, actors etc where a simple single threaded implementation turned out to be massively faster. That being said, after eliminating such cases from my workplaces software, the remaining profiler output is definitely "death by a thousand cuts"...

[-]

SwitchOnTheNiteLite@reddit

That implementation in TrendDetector feels like it was coded specifically to make an article like this. Very strange implemention, bordering on bug.

[-]

thisisjustascreename@reddit

I've seen that stream-within-loop antipattern several times in real code. The Enterprise Java world fell so in love with Streams they became Streams developers and started hammering in screws.

[-]

davvblack@reddit

yeah it’s kinda unsatisfying to see your top contributor at 3%. the fn now takes 98% of the time it did last week

[-]

BrycensRanch@reddit

I am confused why this was downvoted? The article looks good at least skimming it & this comment is exactly what I'm thinking. I can get behind making a resource-intensive function use fewer resources.

[-]

Majik_Sheff@reddit

It's nice when you have a single element that just jumps off the page. It makes it much easier to pinpoint when the regression happened.

Most of my work is on my embedded systems, so I'm just fine until IRQ starts chewing up cycles for no apparent reason.

[-]

null_reference_user@reddit

Oh shit it better not be main again, that motherfucker

[-]

Ameisen@reddit

main is almost never where most of my time is taken up.

[-]

GasterIHardlyKnowHer@reddit

Claude AI slop

[-]

Dragdu@reddit

I might be just boring old C++ programmer, but I find it hard to believe that "2500 (virtual) threads" a realistic concurrency example.

[-]

aoeudhtns@reddit

It's certainly enough to increase contention, the way they work in the JVM. Which is, whenever you may do some blocking operation (be that IO, waiting on a lock or monitor, etc.), the runtime can context switch to another virtual thread on the same physical OS thread (pthread, basically) pending the wake up/join. All of the lock/contention machinery is the same regardless of the number of pthreads.

Many traditional frameworks in Java do use "thread per request" modeling, so in an old Tomcat server (for example), 3,000 reqs/sec could mean up to 3,000 threads. Yeah it wasn't good. In one of my own applications, we went from 3k+ real threads to ~100 real threads when we started submitting jobs on virtual threads.