One Method Was Using 71% of CPU. Here's the Flame Graph.
Posted by ketralnis@reddit | programming | View on Reddit | 17 comments
Posted by ketralnis@reddit | programming | View on Reddit | 17 comments
LevelIndependent672@reddit
ngl 71% on one method is actually kinda satisfying lol. way better than when its spread across 20 different places and you dont even know where to start
Ameisen@reddit
The "death by a thousand cuts" issue is what happens even people take the "premature optimization" quote to an extreme.
Yeah, if you completely ignore best performance practices, those branch misses, cache misses, false shares, etc start to add up.
Matthew94@reddit
For one thing, it's not part of the C++ standard.
Ameisen@reddit
Every mainstream compiler supports it as an extension in C++.
People use extensions all the time - often without realizing it.
valarauca14@reddit
While you're technically correct, the best kind of correct. What you take with issue with is a logical side effect of having a restrict pointer within your block scope.
How can you access the underlying type/object/buffer without violating the definition of restrict?
If you can modify an L-Value two different ways in the same block scope; one is restrict and the other isn't, making 1 pointer restrict was UB.
Ameisen@reddit
In an older comment on an older post, I specify that
__restricthas the opposite logical constraints ofconst- you can often safely remove__restrict, but you cannot safely add it.In the case of two variables within the same scope, though, well, yes - that becomes more problematic. Don't do that.
In terms of calling member functions, though, it's a non-issue either way - you should be able to call any member function as no constraints are violated.
Necessary-Signal-715@reddit
I wish I could think about low level optimization at work. The average B2B/Enterprise Software is slow because of very high-level stupidities like DB queries or service calls in a loop (n+1 query problem), thanks to ORMs and microservices. I've also seen a lot of parallelized code, workers, actors etc where a simple single threaded implementation turned out to be massively faster. That being said, after eliminating such cases from my workplaces software, the remaining profiler output is definitely "death by a thousand cuts"...
SwitchOnTheNiteLite@reddit
That implementation in TrendDetector feels like it was coded specifically to make an article like this. Very strange implemention, bordering on bug.
thisisjustascreename@reddit
I've seen that stream-within-loop antipattern several times in real code. The Enterprise Java world fell so in love with Streams they became Streams developers and started hammering in screws.
davvblack@reddit
yeah it’s kinda unsatisfying to see your top contributor at 3%. the fn now takes 98% of the time it did last week
BrycensRanch@reddit
I am confused why this was downvoted? The article looks good at least skimming it & this comment is exactly what I'm thinking. I can get behind making a resource-intensive function use fewer resources.
Majik_Sheff@reddit
It's nice when you have a single element that just jumps off the page. It makes it much easier to pinpoint when the regression happened.
Most of my work is on my embedded systems, so I'm just fine until IRQ starts chewing up cycles for no apparent reason.
null_reference_user@reddit
Oh shit it better not be
mainagain, that motherfuckerAmeisen@reddit
mainis almost never where most of my time is taken up.GasterIHardlyKnowHer@reddit
Claude AI slop
Dragdu@reddit
I might be just boring old C++ programmer, but I find it hard to believe that "2500 (virtual) threads" a realistic concurrency example.
aoeudhtns@reddit
It's certainly enough to increase contention, the way they work in the JVM. Which is, whenever you may do some blocking operation (be that IO, waiting on a lock or monitor, etc.), the runtime can context switch to another virtual thread on the same physical OS thread (pthread, basically) pending the wake up/join. All of the lock/contention machinery is the same regardless of the number of pthreads.
Many traditional frameworks in Java do use "thread per request" modeling, so in an old Tomcat server (for example), 3,000 reqs/sec could mean up to 3,000 threads. Yeah it wasn't good. In one of my own applications, we went from 3k+ real threads to ~100 real threads when we started submitting jobs on virtual threads.