I would have liked to known how much the cpu throttled down. I have several small factor mini's (different brands) and they all throttle the cpu under heavy load, there simply isn't enough heat dissipation. To be clear, I am not talking about overclocking, just putting the cpu under heavy load, the small foot print devices are at a disadvantage. That hasn't stopped me from owning several, they are fantastic.
I am neither disagreeing nor agreeing here. I would like to have seen the heat and cpu throttling as part the presentation.
Intel Core 2 (Conroe) peaked at around 3.5GHz (65nm) in 2006 with 2 cores. This was right around the time when Denard Scaling failed. Agner Fog says it has a 15 cycle branch prediction penalty.
Golden cove peaked at 5.5GHz (7nm, I've read 12/14 stages but also a minimum 17 cycle prediction penalty, so I don't know) in 2021 with 8 cores. Agner Fog references an Anandtech article saying Golden Cove has a 17+ cycle penalty.
Putting all that together, going from core 2 at 3.5GHz to the 5.4GHz peak in his system is a 35% clockspeed increase. The increased branch prediction penalty of at least 13% decreases actual relative speed improvement to probably something more around 25%.
The real point here is about predictability and dependency handcuffing wider cores.
Golden Cove can look hundreds of instructions ahead, but if everything is dependent on everything else, it can't use that to speed things up.
Golden Cove can decode 6 instructions at once vs 4 for Core 2, but that also doesn't do anything because it can probably fit the whole loop in cache anyway.
Golden Cove has 5 ALU ports and 7 load/store/agu ports (not unified). Core 2 has 3 ALU ports, and 3 load/store/agu ports (not unified). This seems like a massive Golden Cove advantage, but when OoO is nullified, they don't do very much. As I recall, in-order systems get a massive 80% performance boost from adding a second port, but the third port is mostly unused (less than 25% IIRC) and the 4th port usage is only 1-2%. This means that the 4th and 5th ports on Golden Cove are doing basically nothing. Because most of the ALUs aren't being used (and no SIMD), the extra load/store also doesn't do anything.
Golden Cove has massive amounts of silicon dedicated to prefetching data. It can detect many kinds of access patterns far in advance and grab the data before the CPU gets there. Core 2 caching is far more limited in both size and capability. The problem in this benchmark is that arrays are already super-easy to predict, so Core 2 likely has a very high cache hit rate. I'm not sure, but the data for this program might also completely fit inside the cache which would eliminate the RAM/disk speed differences too.
This program seems like an almost ideal example of the worst case scenario for branch prediction. I'd love to see him run this benchmark on something like ARM's in-order A55 or the recently-announced A525. I'd guess those miniscule in-order cores at 2-2.5GHz would be 40-50% the performance of his Golden Cove setup.
Yup, the problem is simple: there was a point, a while ago actually, where adding more silicon didn't do shit because the biggest limits were architectural/design issues. Basically x86 (both 64 I bit and non-64 bi) hit its limits ~10 years ago at least, and from there the benefits become highly marginal, instead of exponential.
Now they added new features that allow better use of the hardware and skip the issues. I bet that code from 15 years ago, if recompiled with modern compilers would get a notable increase, but software compiled 15 years ago would certainly follow the rules we see today,
ARM certainly allows an improvement. Anyone using a Mac with an M* cpu would easily attest for this. I do wonder (as personal intution) if this is fully true, or just the benefit of forcing a recompilation. I think it also can improve certain aspects, but we've hit another limit, fundamental to von newman style architectures. We were able to exgtend it by adding caches on the whole thing, in multiple layers, but this only delayed the inevitable issue.
At this point the cost of accessing RAM dominates CPU issues so much that as soon as you hit RAM in a way that wasn't prefetched (which is very hard to prevent in the cases that keep happening) the cost of accesing RAM dominates so much compared to CPU that it matters. That is if there's some time T between page fault interrupts in a thread program the cost of a page fault is something like 100T (assuming we don't need to hit swap memory), the CPU speed is negligible compared to how much time is just waiting for RAM. Yes you can avoid this memory hits, but it requires a careful design of code that you can't fix at compiler level alone, you have to write the code differently to take advantage of this.
Hence the issue. Most of the hardware improvements are marginal instead, because we're stuck on the memory bottleneck. This matters because sofftware has been designed with the idea that hardware was going to give exponential improvments. That is software built ~4 years ago is thought to run 8x faster, but in reality we see improvments to only ~10% of what we saw the last similar jump. So software feels crappy and bloated, even though the engineering is solid, because it's done with the expectation that hardware alone will fix it. Sadly it's not the case.
I believe the real ARM difference is in the decoder (and eliminating all the edge cases) along with some stuff like looser memory.
x86 decode is very complex. Find the opcode byte and check if a second opcode byte is used. Check the instruction to see if the mod/register byte is used. If the mod/register byte is used, check the addressing mode to see if you need 0 bytes, 1 displacement byte, 4 displacement bytes, or 1 scaled index byte. And before all of this, there's basically a state machine that encodes all the known prefix byte combinations.
The result of all this stuff is extra pipeline stages and extra branch prediction penalties. M1 supposedly has a 13-14 cycle while Golden Cove has a 17+ cycle penalty. This alone is a 18-24% improvement for the same clockspeed on this kind of unpredictable code.
Modern systems aren't Von Neumann where it matters. They share RAM and high-level cache between code and data, but these split apart at the L1 level into I-cache and D-cache so they can gain all the benefits of Harvard designs.
"4000MHz" RAM is another lie people believe. The physics of the capacitors in silicon limit cycling of individual cells to 400MHz or 10x slower. If you read/write the same byte over and over, the RAM of a modern system won't be faster than that old Core 2's DDR2 memory and may actually be slower in total nanoseconds in real-world terms. Modern RAM is only faster if you can (accurately) prefetch a lot of stuff into a large cache that buffers the reads/writes.
A possible solution would be changing some percentage of the storage into larger, but faster SRAM then detect which stuff is needing these pathological sequential accesses and moving it to the SRAM.
At the same time, Moore's Law also died in the sense that the smallest transistors aren't getting much smaller each node shrink as seen by the failure of SRAM (which uses the smallest transistor sizes) to decrease in size on nodes like TSMC N3E.
Unless something drastic happens at some point, the only way to gain meaningful performance improvements will be moving to lower-level languages.
I believe the real ARM difference is in the decoder (and eliminating all the edge cases) along with some stuff like looser memory.
The last part is important. Memory models are important because they define how consistency is kept across multiple copies (on the cache layers as well as RAM). Being able to losen the requirements means you don't need to sync cache changes at a higher level, nor do you need to keep RAM in sync, which reduces waiting for slower operations.
x86 decode is very complex.
Yes, but nowadays x86 gets pre-decoded into microcode/microops, which is a RISC encoding, and has most of the advantages of ARM, at least when code is running.
But yeah, in certain cases the pre-decoding needs to be accounted for, and there's various issues that makes things messy.
The result of all this stuff is extra pipeline stages and extra branch prediction penalties. M1 supposedly has a 13-14 cycle while Golden Cove has a 17+ cycle penalty.
I think that the penalty comes from the how long the pipeline is (therefore how much needs to be redone). I think part of the reason this is fine is because the M1 gets a bit more flexibility in how it spreads power across cores, letting it run a higher speeds without increasing power consumption too much. Intel (and this is my limited understanding, I am not an expert on the field) instead, with no effient cores, uses optimizations such a longer pipelines so that the CPU is able to run "faster" (as in faster wallclock) at lower cpu hertz.
Modern systems aren't Von Neumann where it matters.
I agree, which is why I called them "Von Neumann style" but the details you mention on it being like a Harvard architecture at the CPU level have little matter here.
I argue that the impact from reading of cache is negligible in the long run. It matters, but not too much, and as the M1 showed there's space to improve things there. The reason I claim this is because once you have to hit RAM you get a real impact.
"4000MHz" RAM is another lie people believe...
You are completely correct in this paragraph. You also need the CAS latency there. A quick search showed me a DDR5 6000Mhz with a CL28 CAS. Multiply the CAS by 2000, divide it by the Mhz, and you get ~9.3 ns true latency. DDR5 lets you load a lot of memory each cycle, but again here we're assuming you didn't have the memory in cache so you have to wait. I remember buying RAM and researching for the latency ~15 years ago, and guess what? RAM real latency was still ~9ns.
At 4.8Ghz, that's ~43.2 cycles that we're waiting. Now most operations take more than one cycle, but I think that my estimate of ~10x waiting is reasonable. When you consider that CPUs nowadays do more operations in one cycle (thanks to pipelines) then you realize that you may have something closer to 100x operations that you didn't do because you were waiting. So CPUs are doing less each time (which is part of why the focus has been on power saving, making CPUs that hog power to run faster are useless because they still end up just waiting most of the time).
That said for the last 10 years most people would "feel" the speed up, without realizing that it was because they were saving on swap memory. Having to access a disc, assuming from a really fast M2 SSD, would be ~10,000-100,000x of wait-time in comparison. Having larger RAM means that you don't need to push memory pages into disc, and that saves a lot of time.
Nowadays OSes will even "preload" disc memory into RAM, which reduces latency of loading even more. That said when running the program people do not notice the speed increase.
A possible solution would be changing some percentage of the storage into larger, but faster SRAM
I argue that the increase is minimal. Even halving the latency would still have time being dominated by waiting for RAM.
I think that a solution would be to rethink memory architecture. Another is to expose even more "speed features" such as prefetching or reordering explicitly through the bytecode somehow. Similar to ARM's loser memory model helping M2 be faster, compilers and others may be able to better optimize prefetching, pipelining, etc. by having context that the CPU just wouldn't, allowing for things that wouldn't work for every code, but would work for this specific code because of context that isn't inherent to the bytecode itself.
At the same time, Moore's Law also died in the sense that the smallest transistors
Yeah, I'd argue that happened even before. That said, it was never Moore's law that "efficiency/speed/memory will double every so much", rather that we'd be able to double the amount of transistors in some space for half the price. There's a point were more transistors are marginal, and in "computer speed" we stopped the doubling sometime in the early 2000s.
Unless something drastic happens at some point, the only way to gain meaningful performance improvements will be moving to lower-level languages.
I'd argue the opposite: high level languages are probable the ones that would be able to best take advantage of changes, without rewriting code. You would need to recompile. Low level languages you need to be aware of these details, so a lot of code needs to be rewritten.
But if you're using the same binary from 10 years ago, well there's little benefit from "faster hardware".
Yes, but nowadays x86 gets pre-decoded into microcode/microops, which is a RISC encoding, and has most of the advantages of ARM, at least when code is running.
It doesn't pre-decode per-se. It decodes and will either go straight into the pipeline or into the uop cache then into the pipeline, but still has to be decoded and that adds to the pipeline length. The uop cache is decent for not-so-branchy code, but not so great for other code. I'd also note that people think of uops as small, but they are usually LARGER than the original instructions (I've read that x86 uops are nearly 128-bits wide) and each x86 instruction can potentially decode into several uops.
A study of Haswell showed that integer instructions (like the stuff in this application) were especially bad at using cache with a less than 30% hit rate and the uop decoder using over 20% of the total system power. Even in the best case of all float instructions, the hit rate was just around 45% though that (combined with the lower float instruction rate) reduced decoder power consumption to around 8%. Uop caches have increased in size significantly, but even 4,000 ops for Golden Cove really isn't that much compared to how many instructions are in the program.
I'd also note that the uop cache isn't free. It adds its own lookup latencies and the cache + low-latency cache controller use considerable power and die area. ALL the new ARM cores from ARM, Qualcomm, and Apple drop the uop cache. Legacy garbage costs a lot too. ARM reduced decoder area by some 75% in their first core to drop ARMv8 32-bit (I believe it was A715). This was also almost certainly responsible for the majority of their claimed power savings vs the previous core.
AMD's 2x4 decoder scheme (well, it was written in a non-AMD paper decades ago) is an interesting solution, but adds way more complexity to the implementation trying to track all the branches through cache plus potentially bottlenecking on long code sequences without any branches for the second decoder to work on.
Intel... uses optimizations such a longer pipelines so that the CPU is able to run "faster" (as in faster wallclock) at lower cpu hertz.
That is partially true, but the clock differences between Intel and something like M4 just aren't that large anymore. When you look at ARM chips, they need fewer decode stages because there's so much less work to do per instruction and it's so much easier to parallelize. If Intel needs 5 stages to decode and 12 to for the rest of the pipeline while Apple needs 1 stage to decode and 12 for everything else, the Apple chip will be doing the same amount of stuff in the same amount of stages at the same clockspeed, but with a much lower branch prediction penalty.
Another is to expose even more "speed features" such as prefetching or reordering explicitly through the bytecode somehow.
RISC-V has hint instructions that include prefetch.i which can help the CPU more intelligently prefetch stuff.
Unfortunately, I don't think compilers will ever do a good job at this. They just can't reason welenough about the code. The alternative is hand-coded assembly, but x86 (and even ARM) assembly is just too complex for the average developer to learn and understand. RISC-V does a lot better in this regard IMO though there's still tons to learn. Maybe this is something JITs can do to finally catch up with AOT native code.
I'd argue the opposite: high level languages are probable the ones that would be able to best take advantage of changes, without rewriting code. You would need to recompile. Low level languages you need to be aware of these details, so a lot of code needs to be rewritten.
The compiler bit in the video is VERY wrong in its argument. Here's an archived anandtech article from the 2003 Athlon64 launch showing the CPU getting a 10-34% performance improvement just from compiling in 64-bit instead of 32-bit mode. The 64-bit compiler of 2003 was pretty much at its least optimized and the performance gains were still very big.
The change from 8 GPRs (where they were ALL actually special purpose that could sometimes be reused) to 16 GPRs (with half being truly reusable) along with a better ABI meant big performance increases moving to 64-bit programs. Intel is actually still considering their APX extension which adds 3-register instructions and 32 registers to further decrease the number of MOVs needed (though it requires an extra prefix byte, so it's a very complex tradeoff about when to use what).
An analysis of the x86 Ubuntu repos showed that 89% of all code used just 12 instructions (MOV and ADD alone accounting for 50% of all instructions). All 12 of those instructions date back to around 1970. The rest added over the years are a long tail of relatively unused, specialized instructions. This also shows just why more addressable registers and 3-register instructions is SO valuable at reducing "garbage" instructions (even with register renaming and extra registers).
There's still generally a 2-10x performance boost moving from GC+JIT to native. The biggest jump from the 2010 machine to today was less than 2x with a recompile meaning that even the best-case Java code and updating your JVM religiously for 15 years would still have your brand new computer with the latest and greatest JVM running slightly slower than the 2010 machine with native code.
That seems like a clear case for native code and not letting it bit-rot for 15+ years between compilations.
This is indeed the big difference with the old Internet. People used to do stuff just because they enjoyed it. That stuff still exists, but now it's drowned out by monetization
I had turned on the "donations" feature on a very large mod I'd written for a game.
The moment a donation was made ($10) I immediately declined it and disabled the donation feature.
It felt very wrong. I don't like making people pay for enjoying things I've done (I am a terrible businessman) but I also didn't like the feeling that it established a sense of obligation (more than I already felt).
I really, really don't like this new world of monetization. It makes me very uneasy and stressed.
It's the attitude that, just because you're not interested in making this your job, that no one should be. If the two of your don't want to, that's great. But other people have decided that they'd rather make this kind of thing their job.
Yet, you have a day job as well, no? You have bills to pay. Getting paid for things you do is not bad. Even if it's a hobby. Of course giving away things for free is a generous thing to do as well :).
If I didn't have a "day" job (it's... just my job), I certainly wouldn't be making enough to survive - or even help - through video monetization of what I do or through donations, though.
Getting paid for things you do is not bad
Feeling obligations is when I don't want them - I already feel obligated to update my freeware and support it; I'd rather not pile a monetary responsibility onto my pride-based one. I'd rather see people actually enjoy what I do rather than have to pay for it (which would likely mean that nobody enjoys it).
I doubt the person you are responding to or the people who upvote him actually get what you are saying. They will never understand why you wouldn't just monetize it anyways. That is the depressing as fuck world we live in today. Most don't see it your way. They see you as some form of luddite.
I resonate with this a little. I'd do the donation link but would want a big red flag to only donate if they can afford it, and its not needed, but just a nice to have. Then it would kinda put my mind at ease about the situation
Oh that sounds interesting but im not sure it is so obvious to me! Do you mean reddit gets some money from youtube for tuning their algorithms to prefer links to the youtube domain?
People used to do stuff just because they enjoyed it.
and those people had an alternative income source, and the 'do stuff' was just a hobby.
But for the majority of content on the internet today, it is not a hobby but a source of income (directly or indirectly). In return, theres more content to be had (tho the quality might be somewhat lower, depending on your tolerance).
It absolutely is not better today overall. It is nearly impossible to find written tutorials or any sort of write up for hobbies anymore. It is all HEY GUYS BLAH BLAH BLASH SMASH MY BUTTON SO HARD PLEASE
You said "nobody." If there's a single person out there who enjoys posting informative content then your statement is wrong. There's obviously a lot more than one such person. Hence your statement is obviously wrong.
I'm not saying there isn't a problem with monetization, with too much content being in video format, etc. I'm not even disagreeing with your stance on the issue. But you asked why you got downvotes, so I told you. Sorry you don't like it?
I suggest you reread each sentence I wrote and consider that it stands regardless of the fact that you were exaggerating, and in fact the exaggeration was likely a *contributor* to the downvotes that, again, you asked a question about yet seem so unhappy to have received an explanation for.
I think it's more that people no longer have the attention span for long form textual content. Content creators are trying to adapt, but at the same time, user attention spans are getting shorter.
Which is only a ridiculous indictment of how incredibly bad literacy has gotten in the last 20-30 years.
I don't have the attention span for these fucking 10 minute videos. I read orders of magnitude faster than people speak. They're literally not worth the time.
I don't have the attention span for these fucking 10 minute videos.
Fucking this. I'm not about to spend 10 minutes staring at the screen in the hopes that some rando is finally going to reveal the one minute of actual content they have that I'll miss if I lose my concetration for a bit.
I think the more insidious issue is that social media has eroded even our desire to read books. It's designed to hijack our reward circuitry in the same way that drugs do.
And I wish declining attention spans were the only negative side effect of social media use.
If even adults who grew up without social media are affected by it, imagine how much it affects the younger generation who grew up with it.
I’ve referred to it as weaponised ADHD when discussing the design trap of social media with my missus.
My boy struggles to focus and gets twitchy if there isn’t a screen force feeding pap at him constantly.
We are essentially running an uncontrolled experiment on our young to see what the net result is going to be, it would fill me with more horror if that was different to how we’ve parented as a species for at least a few thousand years though… :D
Yeah, it's an insidious mess. I consider myself lucky that whatever weird combo of chemistry is going on in my brain, I never caught the social media bug. Shitposting on Reddit in the evening is as bad as I get, and that's probably in part because it's still all text.
Yup. You cannot speed a video up fast enough while still making it possible to understand that can compete with how fast I can read.
Literacy has tanked in the last 20 years. I cannot believe how bad it has gotten. Just compare reddit posts from 12 years ago, it is like night and day.
You are not capable of actually getting this so I am not going to bother. If you were capable of understanding why this might be dystopian I wouldn't be responding to this comment.
People want to get paid for making stuff. There is nothing dystopian about that, and I find the notion of calling someone's job fake even though people like their product completely hypocritical.
I don't know about /u/Ameisen or this particular video influencer, but what rubs me the wrong way in the general case is:
This looks like small, independent business, but in reality they are total slaves to the platform monopoly. Not unlike mobile app developers.
Of course, that doesn't touch the issue of actual income. From what I've been told, getting money for views is no longer a viable option, so you either sell stuff or you whore yourself out as a living billboard. That makes them less trustworthy by default, because you have to assume a biased opinion. Well, an even more biased opinion.
Not sure about the dystopian part. One might argue that it is a bit scary that those influencers are a major source of information. But as a job... Well, depending on how to look at it. Being an artist was never easy. And as far as grifts are concerned the dynamics of the game are probably pretty average.
A lot of people struggle to make a good salary and pay their bills, but you become the devil if you monetize something on the internet you're good at it.
Or - outside of my valid concerns with the medium in question being used for this kind of content - I am also opposed to the rampant and nigh-ubiquitous commercialization and monetization of everything.
I don't know how old you are, but I did live through times where it wasn't nearly this bad.
People need to make money to eat. Outside of the whole "Capitalism" thing, I don't see how you can consider someone wanting to be paid for their work to be "deeply concerning".
The internet has been shit for the last decade because of this.
You used to find random pages for a particular thing on which someone was extremely proficient and willing to share their knowledge.
You found blobs of people which just wanted to share their views on the world, or their travels around the world without shoving ads about any particular hotel or restaurant. It was genuine and you could tell so. If you saw a recommendation for a product you knew it was because it was a good product (or at least the poster thought so), not because it had a hidden affiliate link.
Nowadays you can't trust anything you see online, because everything that is posted is done so with intent of extracting money, not with the purpose of sharing information.
When a product is free, you're the product. Google is not a benevolent actor.
I don't like videos for this sort of thing. I have cognitive issues following videos in many cases, and I prefer text and graphs. The shift of things becoming videos more often upsets me. I've been seeing documentation become videos.
Then don't watch it and move on. You don't need this information, nobody that gives a shit about performance is running modern code on decades old hardware. This is just an interesting curiosity.
I understand that this particular video is not essential to anyone's life.
It's more a general gripe that changes in monetisation have made getting information much shittier by making us sit through long videos instead of reading quick half-pagers.
Because videos aren't a optimal - or appropriate - medium for all content.
A lot of content lately that's been forced into video form is effectively speech (that would often be better as text) and some of what are pretty much just screenshots or even videos of text.
And yes - you can transcribe a video.
Or... and this is actually far easier to do - you could make it text and images, and if you must have speech, use TTS.
Yep, if we don't allow people to share in whatever medium they do please, they might just not share at all. If someone cares so much, they can do the work of turning into a blog post or something, but I'm just happy we got a video at all!
And someone who is so poor at presenting that I end up having to read the closed captions anyway. So instead of a column of text, I have Speech-To-Text in video form - complete with all the errors.
This makes no sense in this context. A video creator is creating a video with certain content. Are you now saying everyone who releases a video must also maintain a blog that covers everything their videos cover?
This is only a problem when a single/limited source of information releases by video only. E.g. product manuals, patch notes, etc.
Kinda like how we can turn a bunch of bullet points into a professional sounding email and the recipient can have it converted into bullet points... Yay?
Some things are videos, some things are not videos.
You don't say?
You can choose not to engage with content that is a video.
I can also choose to complain about the fact that more and more content - especially content that isn't best presented in video form - is being presented in video for..
The video investigates the performance of modern PCs when running old-style, single-threaded C code, contrasting it with their performance on more contemporary workloads.
Here's a breakdown of the video's key points:
* Initial Findings with Old Code
* The presenter benchmarks a C program from 2002 designed to solve a pentomino puzzle, compiling it with a 1998 Microsoft C compiler on Windows XP [00:36].
* Surprisingly, newer PCs, including the presenter's newest Geekcom i9, show minimal speed improvement for this specific old code, and in some cases, are even slower than a 2012 XP box [01:12]. This is attributed to the old code's "unaligned access of 32-bit words," which newer Intel i9 processors do not favor [01:31].
* A second 3D pentomino solver program, also from 2002 but without the unaligned access trick, still shows limited performance gains on newer processors, with a peak performance around 2015-2019 and a slight decline on the newest i9 [01:46].
* Understanding Performance Bottlenecks
* Newer processors excel at predictable, straight-line code due to long pipelines and branch prediction [02:51]. Old code with unpredictable branching, like the pentomino solvers, doesn't benefit as much [02:43].
* To demonstrate this, the presenter uses a bitwise CRC algorithm with both branching and branchless implementations [03:31]. The branchless version, though more complex, was twice as fast on older Pentium 4s [03:47].
* Impact of Modern Compilers
* Switching to a 2022 Microsoft Visual Studio compiler significantly improves execution times for the CRC tests, especially for the if-based (branching) CRC code [04:47].
* This improvement is due to newer compilers utilizing the conditional move instruction introduced with the Pentium Pro in 1995, which avoids performance-costly conditional branches [05:17].
* Modern Processor Architecture: Performance and Efficiency Cores
* The i9 processor has both performance and efficiency cores [06:36]. While performance cores are faster, efficiency cores are slower (comparable to a 2010 i5) but consume less power, allowing the PC to run quietly most of the time [06:46].
* Moore's Law and Multi-core Performance
* The video discusses that Moore's Law (performance doubling every 18-24 months) largely ceased around 2010 for single-core performance [10:38]. Instead, performance gains now come from adding more cores and specialized instructions (e.g., for video or 3D) [10:43].
* Benchmarking video recompression with FFmpeg, which utilizes multiple cores, shows the new i9 PC is about 5.5 times faster than the 2010 i5, indicating significant multi-core performance improvements [09:15]. This translates to a doubling of performance roughly every 3.78 years for multi-threaded tasks [10:22].
* Optimizing for Modern Processors (Data Dependencies)
* The presenter experiments with evaluating multiple CRCs simultaneously within a loop to reduce data dependencies [11:32]. The i9 shows significant gains, executing up to six iterations of the inner loop simultaneously without much slowdown, highlighting its longer instruction pipeline compared to older processors [12:15].
* Similar optimizations for summing squares also show performance gains on newer machines by breaking down data dependencies [13:08].
* Comparison with Apple M-series Chips
* Benchmarking on Apple M2 Air and M4 Studio chips [14:34]:
* For table-based CRC, the M2 is slower than the 2010 Intel PC, and the M4 is only slightly faster [14:54].
* For the pentomino benchmarks, the M4 Studio is about 1.7 times faster than the i9 [15:07].
* The M-series chips show more inconsistent performance depending on the number of simultaneous CRC iterations, with optimal performance often at 8 iterations [15:14].
* Geekcom PC Features
* The sponsored Geekcom PC (with the i9 processor) features multiple USB-A and USB-C ports (which also support video output), two HDMI ports, and an Ethernet port [16:22].
* It supports up to four monitors and can be easily docked via a single USB-C connection [16:58].
* The presenter praises its quiet operation due to its efficient cooling system [07:18].
* The PC is upgradeable with 32GB of RAM and 1TB of SSD, with additional slots for more storage [08:08].
* Running benchmarks under Windows Subsystem for Linux or with the GNU C compiler on Windows results in about a 10% performance gain [17:32].
* While the Mac Mini's base model might be cheaper, the Geekcom PC offers better value with its included RAM and SSD, and superior upgradeability [18:04].
Please don't post this AI garbage. I know you're trying to be helpful but this crap doesn't do anything to help anyone, especially if/when it contains inaccuracies.
I haven't had a chance to watch the video yet. Are those ads explicit or is it just integrated in the script of the video itself? Either way the Gemini readout makes it pretty obvious when the video is just an ad
Because this is a youtube creator who has been making videos for over a decade. This is his mode of communication.
There are plenty of other bloggers, hobbyists, etc but they are not presented to you in another format because you are honestly lazy and are relying on others to aggregate content for you. If you want different content, seek it out and you will find your niche. Post it here if you think that there's an injustice being done. You will see that there is simply not as big an interest in reading walls of text.
Implying Matthias is money hungry and somehow apart from other passionate educators is such a joke.
There are plenty of other bloggers, hobbyists, etc but they are not presented to you in another format because you are honestly lazy and are relying on others to aggregate content for you.
That's quite the arbitrary judgment.
Implying Matthias is money hungry and somehow apart from other passionate educators is such a joke.
I don't know much about you either, but I do know that your only contribution to this thread is sharing low effort complaints. Comment on the actual content or move on.
I've seen people argue that learning via reading is somehow always a superior method, and that people who don't do that are artificially limiting themselves.
But I tend to dismiss most black-and-white opinions I see from people.
The fact that you don't understand that being a visual learner means utilizing diagrams and visualizations of concepts instead of just being 'visible text', tells me a lot about you being a dumb pedant.
Using your example, a visual learner would benefit from screenshots of the Unreal editor UI with arrows and highlights pointing to specific checkboxes.
There is no such thing as a visual etc learner anyway, it's been known to be a complete myth for decades. Studies show that all humans benefit most from mixed content types regardless of individual preference.
Using your example, a visual learner would benefit from screenshots of the Unreal editor UI with arrows and highlights pointing to specific checkboxes.
Those people would be far more likely to be designers than programmers.
The same people that Unreal blueprints were designed for.
I'd strongly suggest that you work on your reading comprehension. I'm speaking very deliberately and explicitly. I cannot (and will not), try to clarify further.
Where on God's green earth is being pedantic a disability?
It's the same as saying you have a propensity for splitting hairs or nitpicking. You are literally proving my point, dude.
And, as I very explicitly said, using the term so readily and as what is intended as an insult only really started in the mid-2010s. Seems to be largely a generational term in that regard. I rarely saw it used before then.
Also, ending things with "[my] dude", though that's also dialectal.
I used the term pedant because you were nitpicking and also wrong by telling that guy that reading text was the same as being a visual learner, when that isn't what that means.
Breaking out the DSM5 to try to school me on a disability I have because I used an exceedingly common word is something I've never quite seen before. I have never heard pedant be used as an insult specifically because they were autistic, and I've been at the receiving end of everything from SPED to spaz.
If I say you're compelled to do something, I'm not saying you have obsessive compulsive disorder. There's this weird crybullying thing going on here because you don't want to address my actual complaint. I am insulting you, not because you have a disabilty, but because you jumped up that guy's ass while being wrong and act like a prick.
You don’t want to watch a video? you can get a summary from Gemini. You don’t want to use AI, then I can’t help you. I guess just don’t consume the information then.
Different people prefer to communicate and consume media and technology differently, your preferences are just that.
I personally like some content in YouTube videos, I can watch/listen to them while I’m doing rote tasks.
Absolutely, they can, and do hallucinate. They can and do get things wrong.
But, I don’t think we should hyper focus on hallucination errors. They are just a kind of error.
Humans make mistakes when transcribing, thinking, etc too. Even with doctors we get second opinions.
I think the primary metric we should be looking at is true information per hour.
Obviously, certain categories (like medicine) require more certainty and should be investigated thoroughly. But, other things, like a YouTube video summary, are pretty low stakes thing to get summarized.
I never proposed and would not propose trusting it blindly.
I measure true information per hour with LLMs the same way I do with humans: classifying which information needs to be true, checking against my mental models, and verifying to varying levels depending on how important the information is.
Once you get your head around “computer speed, human-like fallibility ” it’s pretty easy to navigate.
When true information matters, or you’re asking about a domain where you know the LLM has trouble, adding “provide sources” and then checking the sources is a pretty useful trick.
Simple question: how do you validate an LLM has correctly summarized the contents of a video correctly without knowing the contents of the said video beforehand?
Please explain the steps to perform such validations in simple English.
We’re not discussing human summaries here because no one mentioned a human summarizing a video.
The question remains: how can we validate that an LLM-generated summary is accurate and that we’ve been provided the correct information without prior knowledge of the material?
You made the suggestion, and you should be able to defend it and explain why when asked about it.
I have explained why I think LLMs should be judged by human truth standards not classical computer truth standards.
You’re seemingly insisting on a standard of provable truth, which you can’t get from an LLM. Or a human.
You can judge the correctness rate of an LLM summary the same way you judge the correctness rate of a human summary - test it over a sufficiently large sample and see how accurate it is. Neither humans nor LLMs will get 100% correct.
how do you validate the source material? whatever process you apply when you watch the video, you should apply to the summary as well.
the video is likely a summary of other materials as well.
for a lot of videos it doesn't really matter, there is minimal consequences if the summary or source material is incorrect, it's insignificant.
that's why you won't bother validating the video you're watching but have unreasonable expectations on the third hand interpretation.
ketosoy's point was clear and even you as a human struggled to comprehend it, lets not set unrealistic expectations for a language model when a lot of humans are no better.
It’s really unclear to me where this isn’t connecting. You test LLMs like you test humans. I never said you could do it without human intervention (I think that’s what you mean by manual)
Humans decide what accuracy rate and type is acceptable
Humans set up the test
Humans grade the test
This is approximately how we qualify human doctors and lawyers and engineers. None of those professions have 100% accuracy requirements.
Ohhh it's not a video for me. For when I suspect the video could be a blog post, I download the subtitles, parse them to be just text and then proceed to read it. Just a few seconds to get the info instead of 19 minutes.
Perhaps - if it doesn't already exist - someone could/should write a wrapper site for YouTube that automatically does this and presents it as a regular page.
If you are referring to the Panto router, he did make a wooden version. Later he sold the rights to the concept to the company that makes the metal one.
I originally found him from the woodworking. Just thought he was some random woodworker in the woods. Then I saw his name in a man page.
He got fuck you money and went and became Norm Abrams. (Or who knows he may consult on the side).
His website has always been McMaster Carr quality. Straight, to the point, loads fast. I e-mailed if he had some templating engine. Or Perl script or even his own CMS.
CPU Threads haven't gotten faster in about \~15 years, so it makes sense that he doesn't see performance increases, all the incremental gains have been in core count moreso than clock. I suppose you do have more cache / ram nowadays as well, but those are rarely the bottleneck in execution timing.
There were clock speeds faster than you realize in 2010 and the vast majority are slower than you realize today. The best on the market is less than 2x faster than chips were 20 years ago. Most chips aren't the 9800x3d. 3ghz isn't unreasonable.
It's really less about the raw speed nowadays and more about the IPC. Or instructions per clock. A modern processor like the AMD 9950x3d running at a slower 3GHz would absolutely run laps around a 15 year old processor running at 5Ghz
It depends. In a single, single threaded process? Only if it's written in a way that allows it to be run in parallel. That processor running multiple instructions at once still can't run an instruction that depends on the results of another instruction until that first one finishes.
That processor running multiple instructions at once still can't run an instruction that depends on the results of another instruction until that first one finishes.
Branch prediction is an entirely different case and only works at all because there's a finite number of possible states. It does not run dependent operations out of order (I should not have to explain how breaking the laws of causality is impossible) it instead guesses which path a branch will take and starts on it before the instructions that decide which branch actually should run finish. Runahead is the same idea except it runs both branches.
Branch prediction works when which instruction is dependent on another instruction. It does not work when the values the instructions works on are dependent.
That is literally what out-of-order execution does at the instruction level by interleaving execution chains with ready data in the decode/translate/execution pipeline with those awaiting data.
You prevent literal stalls in the pipeline by computing partial results using the values that are ready, and then reorder instructions to make it look like it never happened at all.
Runahead is the same idea except it runs both branches.
What?
No.
That's absolutely not what runahead is. What you just described is speculative execution.
Runahead is a massively more complicated analysis of data-dependent chains of calculations, specifically to address
when the values the instructions works on are dependent.
Out of order execution works on independent instructions. Sorry it still can't break causality.
What they are calling runahead (pick a different term seriously it's already taken) still doesn't actually run a single dependent instruction... look, you clearly don't know what this means. So here's some simple assembly:
add a b
mult a c
Here the multiply cannot execute before the add finishes. It's impossible by all laws of physics.
Introducing a branch doesn't change that. Predicting the right branch doesn't change that. The mult instruction is dependent on the add instruction.
What branch prediction could do is, if this code was more complicated and maybe c was calculated or fetched from memory, it could get started on that. But it would still have to push the mult to the next cycle after the add at absolute minimum.
No amount of trickery will change the fact that it's physically impossible to multiply the value of a by something before the value of a is known. And this is why old code that's not written to be asynchronous can only be sped up so much by things like processors that run multiple instructions at once: at the end of the day, programs are just a series of mathematical operations and they need to be applied in the correct order to get correct results.
I love how you keep repeating this like you think that's a good point or that I've argued that at all.
Yes, obviously to finish a calculation, you must have all the inputs by the end. But processors internally reorganize and reschedule the individual pieces that go into executing individual instructions in order to give the effective appearance that everything happened at once. That's the beauty and insanity of pipelining and execution chains.
So here's some simple assembly:
add a b
mult a c
Christ, it's wild how far some people will go to prove they have no fucking clue what they're talking about just to sound smart. Your idiot child "assembly" is irrelevant in a conversation where we're discussing microcode and the metamagic of processor data/instruction pipelines.
No amount of trickery will change the fact that it's physically impossible to multiply the value of a by something before the value of a is known.
Damn, you got me. It's a real tragedy that all real-world applications are just doing one multiplication at a time. Oh, wait, no. A huge amount of work is actually a wild interleaving of multiple calculations that can be decomposed into independently resolvable chains of calculation and executed in parallel at the pipeline level.
What they are calling runahead (pick a different term seriously it's already taken)
I love the idea that some dipshit Redditor is criticizing actual computer scientists about the usage of words in peer-reviewed work. You're right that the term is "taken", in the sense that it's been used for a decade or more to describe a specific kind of speculative execution intended for resolution of interdependent calculations...but the fact you don't understand that what "they are calling runahead" is the exact thing they are talking about is telling.
I checked your post history, just in case, to be sure I wasn't mistaken in my take on your attitude and skill level. Turns out? I wasn't.
You're an arrogant asshole. And with what justification? Making games in C#?
Go back to shit-talking novice programmers where your experience and understanding gives you an advantage. I've spent the last 20 years writing kernel drivers and distributed storage systems for multi-billion dollar businesses. You're not going to get anywhere with me.
That processor running multiple instructions at once still can't run an instruction that depends on the results of another instruction until that first one finishes.
And that's demonstrably, literally, unequivocably false. The instructions absolutely run at the same time because "an instruction" isn't the unit of work in a CPU; they aren't atomic. They are, in fact, a series of steps which can be interleaved.
This is something every undergrad C.S. student knows by their third year, so what's your excuse, other than refusing to just admit you misspoke?
No, you are obviously ignorant of how execution pipelines work under the covers and have no idea that things like operand forwarding exist for literally this purpose.
If that exceptionally simple example isn't clear enough for you to understand that data-dependent instructions (even at the level of a single arithmetic calculation) can be interleaved at the pipeline level because they are executed in multiple stages, then I guess there's no helping you.
no amount of forwarding let's it happen except after the add.
And nobody ever said it did.
You claimed it was impossible for processors to run instructions before all data was ready.
I pointed out that's not true, because they do run the sub-steps of an instruction at the same time in order to reduce (or completely prevent) pipeline stalls at lowest layers.
And then you turned into a malignant shitlord about it because you couldn't fathom the possibility you were wrong.
that hair was worth splitting
Yes, it was.
If you're gonna speak in absolutes on a very technical topic, be prepared to get called out on technicalities.
Oh please. Tut tut. I, the great debator, used an absolute - no wait, it seems that ansolute held true after all.
All you've done is quibble over the definition of words. You're the comic book guy going achually. You've brought nothing to this discussion I didn't already know. Your semantic arguments are meaningless.
"Actually part of the instruction can run"
That's nice. Is it finished? Oh, gee.
Did anything fundamentally change?
No - we're just talking about microcode instead. Just one level of abstraction lower. But you demonstrated earlier that you're incapable of thinking abstractly when I compared it to optimizing compilers so, oh well.
Do I have to produce wiring diagrams to make a claim? Put a little asterisk with a foot note explaining how different materials in the cpu interact?
This is true for a single dependent chain, but program fragments generally comprise many such chains intertwined. Current processors have significantly larger reorder buffers than in the past so they can reach deeper into programs to find and make simultaneous progress on vastly more of these chains.
They have indeed. Here's some trend data, with graphs: https://github.com/karlrupp/microprocessor-trend-data
Clock speed plateaued around 2006 because of heat management problems. Single thread performance stopped following Moore's law at that point; it's still going up, but more slowly.
Transistor count and parallel performance has continued to increase exponentially so far, but it's hard to make efficient use of it. Most software is still single-threaded.
Less complicated processors with simpler pipelines do allow for higher clock frequencies. So an older processor could actually run at a higher clock frequency than a modern processor
oh but it does... ever heard of pipelining and superscalar execution? Every CPU nowadays runs 4-5 instructions in parallel... Also a reason why there is such a huge hit from branching
See the other thread, seriously - why do I keep having to say this?
There's a limit to how much the processor can do before it runs into the hard reality of physics. Running instructions in parallel is possible when instructions don't depend on the results of another - in fact the hard part is figuring out which instructions it's safe to run at the same time.
Programs have to be written in a way that allows the processor to do this, and they often are not.
Aka: it fucking depends if you're getting a speed boost, and you still can't break causality to achieve it.
> in fact the hard part is figuring out which instructions it's safe to run at the same time
Depends on what you consider hard... There are known solutions to this - see Tomasulo algorithm.
> Programs have to be written in a way that allows the processor to do this, and they often are not.
It's job of compiler to make the code suitable, not program's job. Again countless number of known solutions - loop unrolling, branch elimination, specialized instructions. Hardware helps too.
4-5x speed up of superscalar architecture has been proven.
In terms of programming, there's very little you can do to affect superscalar behavior. Compiler is going to reorder everything tp be as optimal as possible anyways. Unless you on MSVC, that's pure garbage.
>It's not the programmers job to write an efficient program? Huh?
Try doing sum 0 to n in a loop, you get constant time formula with -O2. Nevertheless the discussion is about superscalar architecture, not optimality of algorithms, completely different discussion.
>There's no magic involved. Even the most modern super scaled pipelined architecture can stall
Nobody's is denying that, but back to your original point.
>All the parallelization in the world won't save you from synchronous code.
More often than not, you will see parallel behavior on instruction level, so that in way is saving you from sequential code... Otherwise CPUs would be 2-3x slower than they are.
https://www.cpubenchmark.net/cpu.php?id=6281&cpu=Intel+Core+Ultra+7+258V
here it is, this not only uses less power but also is ~4x faster(atleast in this benchmark)
It's a far cry from double every 2 years at the peak of the Moore's law logistical curve. Improvements have greatly slowed, but not to the extent the comment you're replying is saying.
Maybe for compiled languages, but not for interpreted languages, .e.g. Java, .Net, C#, Scala, Kotlin, Groovy, Clojure, Python, JavaScript, Ruby, Perl, PHP, etc. New vm interpreters and jit compilers come with performance & new hardware enhancements so old code can run faster.
this doesn't contradict the premise. Your program runs faster because new code is running on the computer. That's not a new computer speeding up old code, that's new code speeding up old code. It's actually an example of the fact that you need new code in order to make software run fast on new computers.
The premise is straight up wrong though. There are plenty of examples of programs and games that have to be throttled in order to not run too fast, and they were written in low level languages like C. I'm not going to bother watching a video that makes an obviously incorrect premise to see they caveat their statement with a thousand examples of when it's false.
That hasn't been widely true since the early '90s. Games have been using real time clocks (directly or indirectly) for decades. Furthermore, games in particular benefit greatly from massively parallel workloads which is the exact opposite of what this video is talking about. Old games might run hundreds-to-thousands of times faster when you port their code to modern GPUs compared to their original software renderers.
But if you take, say, MS office 2007 and run it on a machine from 2025, the user experience will be pretty much the same on a computer from today as one from the time.
You've changed the subject. GP was referring to games that rely on the underlying timing of the CPU that failed to work correctly on faster computers.
Those games were controlling their pacing (as in how fast the actual game logic/simulation progresses compared to real time) using clocks whose rates were tied to CPU performance.
Since then, they have been using realtime clocks for that purpose and it is not relevant.
Games having higher frame rates is not the question. The question is whether single-threaded performance has improved on CPUs over time.
Can we please try to hold onto context for more than one comment?
You're referring to a time period that is irrelevant to the point being made in the video that we're all discussing (or not, i guess?).
The time period where games didn't run correctly from one generation of computer to the next was around teh same time that moore's law was still massively improving single-threaded performance with every CPU generation.
This video is talking about how that trend flattened out.
Go check and see a graph of Moore's Law....here I'll make it easy on you https://ourworldindata.org/moores-law It's almost as if it's still pretty much on track. Sure it's slowed down a bit, but barely. People's perception of computer speeds FEEL like it slowed down because as I mentioned earlier, developers stopped caring about optimization. Why bother when you have to ship now and computers will keep getting faster. The computers are faster, the software is just getting shittier.
Sure it's slowed down a bit, but barely. People's perception of computer speeds FEEL like it slowed down because as I mentioned earlier, developers stopped caring about optimization.
I actually use daily a Zen2 machine and a Sandy Bridge laptop, so it's ~8 years apart, with all the same software. Zen2 obviously feels faster, which is nice of course, but it's not that much faster honestly, as far as single-threaded performance goes. From 8 years of progress I should expect a 16x improvement. While some of my own programs do indeed get faster, it's maybe like 2-3x (that I can vouch for that it's actually bound by the processor and memory speed -- I can't make a general fair comparison because hard drives are different). And in some others there is really not that much significant difference, maybe 1.1x at most or something. I will probably get a Zen5 processor, but I just want AVX-512 (maybe more cores would be nice too), and I don't really expect it to be that much faster for normal stuff.
Do some work in a field that requires massive computing power like ML model training and you will see it. This video is shit.
Oh definitely, but I'm guessing the main source of improvement there (ignoring higher core count) will come from routines optimized for specific architectures. So not "old code running faster" -- it's running different code.
Moore's law was never about raw single core processing speed though, nor about speed at all, it is only concerning the number of discrete components on the processor (nowadays pretty much equal to the transistor count)
annnnd you are still wrong, but good job trying to double down on it
In 1965, Moore predicted that the number of transistors on integrated circuits would double annually for the next decade. He revised this prediction in 1975, stating that the doubling would occur every two years.
If an animation(and associated logic, like a bullet reloaded)is 3x as fast because the frame rate is at 300, is it not the same issue? Instead of CPU clocks we are just talking about framerates which depends on CPU performance.
Hardly, anyone who says Office 2007 will be the same doesn't remember what it was like in 2007. There are significant times when the program is doing tasks that would bring up waiting signs, or just not do anything. Sure, they actual typing part is largely the same because you are the limiting factor, not the computer. If computers didn't matter for execution speed, we would all still be running 0086 chips
A 2007 pc was likely clocked at 1-2gHz on two cores. A pc today is often 3-4gHz, on 8 cores. Maybe 16. Even if we're using perfectly parallelised execution (lol, lmao even) that's not even 2 orders of magnitude.
If something is effectively using a GPU that's a different story, but user software in 2007 was not using the GPU like this, and very little is today either
You do realize "new computers" doesn't just mean CPU. The only reason new code is slow is because no-one bothers to optimize it like they did 20+ years ago.
Is it really that hard to draw the distinction at replacing the CPU?
If you took an old 386 SX and upgraded to a 486 DX the single-threaded performance gains would be MUCH greater than if you replaced an i7-12700 with an i7-13700.
Now i'm wondering, if (when) somebody is going to showcase a program compiled to CPU microcode. Entire functions compiled and "called" using a dedicated assembly instruction.
Someone at Intel was making some experiments, couldn't find more info though: https://www.intel.com/content/dam/develop/external/us/en/documents/session1-talk2-844182.pdf
Funny thing is that only Ruby and Perl, of the languages you listed, are still "interpreted." Maybe also PHP before it's JITed.
Running code in a VM isn't interpreting. And for every major JavaScript engine, it literally compiles to machine language as a first step. It then can JIT-optimize further as it observes runtime behavior, but there's never VM code or any other intermediate code generated. It's just compiled.
There's zero meaning associated with calling languages "interpreted" any more. I mean, if you look, you can find a C interpreter.
Not interested in seeing someone claim that code doesn't run faster on newer CPUs though. It's either obvious (if it's, e.g., disk-bound) or it's nonsensical (if he's claiming faster CPUs aren't actually faster).
Ruby runs as bytecode, and a JIT converts the bytecode to machine code which is executed. Which is really cool because now Ruby can have code which used to be in C re-written in Ruby, and because of YJIT or soon ZJIT, it runs faster than the original C implementation. And more powerful CPUs certainly means quicker execution.
I didn’t watch the video, but I’d guess the point it makes is that new computers have lots of features (increased parallelism; vectorization support; various optimized instructions; …), but none of those would help a (single-threaded) program that was compiled without those features.
Pure single-thread processor frequency has stayed relatively stable (or even went down in comparison to some models) over quite a few years.
So I wonder if it would be possible to make a program that analyses executables, sort of like a decompiler does, with the intent to recompile it to take advantage of newer processors.
GraalVM is a niche that requires specialized handling to get working if it can work at all. There are some Java apps that cannot ever be compiled to GraalVM.
"For executables" is what you've meant to say, because AOT and JIT compilers aren't any different here, as you can compile the old code with a newer compiler version in both cases. Though there is a difference in that a JIT compiler can in theory detect CPU features automatically, while with AOT you have to generally do either some work to add function multi-versioning, or compile for a minimal required or specific architecture.
I can't begin to tell you how complicated it is to do benchmarking like this carefully, and well. Simultaneously, while interesting, this is only one leg in how to track performance from generation to generation. But, this work is seriously lacking. The control in this video is the code, and there are so many systematic errors in his method, that is is difficult to even start taking it apart. Performance tracking is very difficult – it is best left to experts.
As someone who is a big fan of Matthias, this video does him a disservice. It is also not a great source for people to take from. It's fine for entertainment, but it's so riddled with problems, it's dangerous.
The advice I would give to all programmers – ignore stuff like this, benchmark your code, optimize the hot spots if necessary, move on with your life. Shootouts like this are best left to experts.
I don't know if you understand what he's saying. He's pointing out that if you just take an executable from back in the day, you don't get as big of improvements by just running it on a newer machine. That's why he compiled really old code with a really old compiler.
Then he demonstrates how recompiling it can take advantage of knowledge of new processors, and further elucidates that there are things you can do to your code to make more gains (like restructuring branches and multithreading) to get bigger gains than just slapping an old executable on a new machine.
Most people aren't going to be affected by this type of thing because they get a new computer and install the latest versions of everything where this has been accounted for. But some of us sometimes run old, niche code that might not have been updated in a while, and this is important for them to realize.
My point is – I am not sure he understands what he's doing here. Using his data for most programmers to make decisions is not a good idea.
Rebuilding executables, changing compilers and libraries and OS versions, running on hardware that isn't carefully controlled, all of these things add variability and mask what you're doing. The data won't be as good as you think. When you look at his results, I can't say his data is any good, and the level of noise a system could generate would easily hide what he's trying to show. Trust me, I've seen it.
To generally say, "hardware isn't getting faster," is wrong. It's much faster, but as he (~2/3 of the way through the video states) it's mostly by multiple cores. Things like unrolling the loops should be automated by almost all LLVM based compilers (I don't know enough about MS' compiler to know if they use LLVM as their IR), and show that he probably doesn't really know how to get the most performance from his tools. Frankly, the data dependence in his CRC loop is simple enough that good compilers from the 90s would probably be able to unroll for him.
My advice stands. For most programmers: profile your code, squish the hotspots, ship. The performance hierarchy is always: "data structures, algorithm, code, compiler". Fix your code in that order if you're after the most performance. The blanket statement that "parts aren't getting faster," is wrong. They are, just not in the ways he's measuring. In raw cycles/second, yes they've plateaued, but that's not really important any more (and limited by the speed of light and quantum effects). Almost all workloads are parallelizable and those that aren't are generally very numeric and can be handled by specialization (like GPUs, etc.).
In the decades I spent writing compilers, I would tell people the following about compilers:
You have a job as long as you want one. Because compilers are NP-problem on top of NP-problem, you can add improvements for a long time.
Compilers improve about 4%/year, halving performance in about 16-20 years. The data bears this out. LLVM was transformative for lots of compilers, and while a nasty, slow bitch it lets lots of engineers target lots of parts with minimal work and generate very good code. But, understanding LLVM is its own nightmare.
There are 4000 people on the planet qualified for this job, I get to pick 10. (Generally in reference to managing compiler teams.) Compiler engineers are a different breed of animal. It takes a certain type of person to do the work. You have to be very careful, think a long time, and spend 3 weeks writing 200 lines of code. That's in addition to understanding all the intricacies of instruction sets, caches, NUMA, etc. These engineers don't grow on trees, and finding them takes time and they often are not looking for jobs. If they're good, they're kept. I think the same applies for people who can get good performance measurement. There is a lot of overlap between those last two groups.
I guess you missed the part where I spoke about an old executable. You can't necessarily recompile because you don't always have the source code. You can't expect the same performance gains on code compiled targeting a Pentium II when you run it on a modern CPU as if you recompile it and possible make other considerations to take advantage of it. That's all he's really trying to show.
I did not in fact miss the discussion of the old executable. My point is that there are lots of variables that need to be controlled for outside the executable. Was a core reserved for the test? What about memory? How did were the loader, and dyn-loader handled? i-Cache? D-Cache? File cache? IRQs? Residency? Scheduler? When we are measuring small differences, these noises affect things. They are subtle, they are pernicious, and Windows is (notoriously) full of them. (I won't even get to the point of the sample size of executables for measurement, etc.)
I will agree, as a first-or-second-order approximation, calling time ./a.out a hundred times in a loop and taking the median will likely get you close, but I'm just saying these things are subtle, and making blanket statements is fraught with making people look silly.
Again, I am not pooping on Matthias. He is a genius, an incredible engineer, and in every way should be idolized (if that's your thing). I'm just saying most of the r/programming crowd should take this opinion with salt. I know he's good enough to address all my concerns, but to truly do this right requires time. I LOVE his videos, and I spent 6 months recreating his gear printing package because I don't have a windows box. (Gear math -> Bezier Path approximations is quite a lot of work. His figuring it out is no joke.) I own the plans for his screw advance jig, and made my own with modifications. (I felt the plans were too complicated in places.) In this instance, I'm just saying, for most of r/programming, stay in your lane, and leave these types of tests to people who do them daily. They are very difficult to get right. Even geniuses like Matthias could be wrong. I say that knowing I am not as smart as he is.
Sounds like you would tell someone that is running an application that is dog slow that "theoretically it should run great, there's just a lot of noise in the system." instead of trying to figure out why it runs so slowly. This is the difference between theoretical and practical computer usage.
I also kind of think you are saying that he is making claims that I don't think he is making. He's really just sort of giving a few examples of why you might not get the performance you might expect when running old executables on a new CPU. He's not claiming that newer computers aren't indeed much faster, he's saying they have to be targeted properly. This is the philosophy of Gentoo Linux that you can get much more performance by running software compiled to target your setup rather than generic, lowest common denominator executables. He's not trying making as detailed and extensive claims that you seem to be discounting.
Just for fun I tested the oldest program I could find that I wrote myself (from 2003), a simple LZ-based data compressor. On an i7-6700 it compressed a test file in 5.9 seconds and on an i3-10100 it took just 1.7 seconds. More than 300% speed increase! How is that even possible when according to cpubenchmark.net the i3-10100 should only be about 20% faster? Well, maybe because the i3-10100 has much faster memory installed?
I recompiled the program with VS2022 using default settings. On the i3-10100, the program now runs in 0.75 seconds in x86 mode and in 0.65 seconds in x64 mode. That's like a 250% performance boost!
Then I saw some badly written code... The program outputs the progress to the console, every single time it wrote compressed date to the destination file... Ouch! After rewriting that to only output the progress when the progress % changes, the program runs in just 0.16 seconds! Four times faster again!
So, did I really benchmark my program's performance, or maybe console I/O performance? Probably the latter. Was console I/O faster because of the CPU? I don't know, maybe console I/O now requires to go through more abstractions, making it slower? I don't really know.
So what did I benchmark? Not just the CPU performance, not even only the whole system hardware (cpu, memory, storage, ...) but the combination of hardware + software.
Moore's law states that "the number of transistors on an integrated circuit will double every two years". So it is not directly about performance. People kind of always get that wrong.
Is it just me who has a totally different understanding of what "code" means?
To me "code" means literally just plain text that follows a syntax. And that can be processed further. But once it's processed, like compiled or whatever, then it becomes an executable artifact.
It's the latter that probably can't be sped up. But code, the plain text, once processed again on a new computer can very much be sped up.
It's not about a specific clock speed, it's about the fact that old games weren't designed with their own internal timing clock independent from the CPU clock.
I was pointing it out as the general reason, not exactly the specific reason. Several mini games in FF7 don’t do any frame-limiting, such as the second reply discusses as a mitigation, so they’d run super fast on much newer hardware.
Not related to the CPU stuff, as I mostly agree and until very recently used a I7-2600 as a daily for what most would consider a super heavy workload (VM's, docker stacks, Jetbrains IDE etc.) and still use a E8600 on the regular. Something else triggered my geek side.
That Dell Keyboard (the one in front) is the GOAT of membrane keyboards. I collect keyboards, have more than 50 in my collection but that Dell was so far ahead of its time it really stands out. The jog dial, the media controls and shortcuts combined with one of the best feeling membrane actuations ever. Pretty sturdy as well.
I have about 6 of the wired and 3 of the Bluetooth versions of that keyboard to make sure I have them available to me until I cannot type any more.
Do people not remember when 486 computers had a turbo button to allow you to downclock the CPU so that you could run games there were designed for slower CPUs at a slower speed?
It's the point though we're talking about hardware and not compiler here. He goes into compilers in the video, but the point he makes is from a hardware perspective the biggest increases have been from better compilers and programs (aka writing better software) instead of just faster computers.
For gpu's, I would assume it's largely the same, we just put a lot more cores in GPUs over the years so it seems like the speedup is far greater.
The older the code, the more likely it is to be optimized for particular hardware and with a particular compiler in mind.
Old code using a compiler contemporary with the code, won't massively benefit from new hardware because none of the stack knows about the new hardware (or really the new machine code that the new hardware runs).
If you compiled with a new compiler and tried to run that on an old computer, there's a good chance it can't run.
That is really the point. You need the right hardware+compiler combo.
Most popular programming languages are single threaded by default. You need to explicitely add multi-threading to make use of multi-cores, which is why you don't see much speedup adding cores.
With GPUs the SDKs are oriented towards massively parellizable operations. So adding cores makes a difference.
well its a little of column A, a little of column B
the cpus are massively parallel now and do a lot of branch prediction magic etc but a lot of those features don't happen without the compiler knowing how to optimize for that CPU
like you can't expect an automatic speedup of single threaded performance without recompiling the code with a modern compiler; you're basically one of the CPU's arms behind its back.
I have limited time and youtube videos are all about garnering views. long gone are the times where people made videos because they were passionate about something. even in this video you have ads and product placement/recommendations. yeah, the person is not doing it because of their generous. they want money. me viewing it is giving them money and supporting the way they do videos. if you want things to change you have to change how you consume media
The is incorrect - many games which functioned normally on older processors became unplayably fast on newer computers. Anyone who has played games from the DOS era knows this.
As the industry progressed they focused on FPS. So they executed differently.
haltline@reddit
I would have liked to known how much the cpu throttled down. I have several small factor mini's (different brands) and they all throttle the cpu under heavy load, there simply isn't enough heat dissipation. To be clear, I am not talking about overclocking, just putting the cpu under heavy load, the small foot print devices are at a disadvantage. That hasn't stopped me from owning several, they are fantastic.
I am neither disagreeing nor agreeing here. I would like to have seen the heat and cpu throttling as part the presentation.
theQuandary@reddit
Clockspeeds mean almost nothing here.
Intel Core 2 (Conroe) peaked at around 3.5GHz (65nm) in 2006 with 2 cores. This was right around the time when Denard Scaling failed. Agner Fog says it has a 15 cycle branch prediction penalty.
Golden cove peaked at 5.5GHz (7nm, I've read 12/14 stages but also a minimum 17 cycle prediction penalty, so I don't know) in 2021 with 8 cores. Agner Fog references an Anandtech article saying Golden Cove has a 17+ cycle penalty.
Putting all that together, going from core 2 at 3.5GHz to the 5.4GHz peak in his system is a 35% clockspeed increase. The increased branch prediction penalty of at least 13% decreases actual relative speed improvement to probably something more around 25%.
The real point here is about predictability and dependency handcuffing wider cores.
Golden Cove can look hundreds of instructions ahead, but if everything is dependent on everything else, it can't use that to speed things up.
Golden Cove can decode 6 instructions at once vs 4 for Core 2, but that also doesn't do anything because it can probably fit the whole loop in cache anyway.
Golden Cove has 5 ALU ports and 7 load/store/agu ports (not unified). Core 2 has 3 ALU ports, and 3 load/store/agu ports (not unified). This seems like a massive Golden Cove advantage, but when OoO is nullified, they don't do very much. As I recall, in-order systems get a massive 80% performance boost from adding a second port, but the third port is mostly unused (less than 25% IIRC) and the 4th port usage is only 1-2%. This means that the 4th and 5th ports on Golden Cove are doing basically nothing. Because most of the ALUs aren't being used (and no SIMD), the extra load/store also doesn't do anything.
Golden Cove has massive amounts of silicon dedicated to prefetching data. It can detect many kinds of access patterns far in advance and grab the data before the CPU gets there. Core 2 caching is far more limited in both size and capability. The problem in this benchmark is that arrays are already super-easy to predict, so Core 2 likely has a very high cache hit rate. I'm not sure, but the data for this program might also completely fit inside the cache which would eliminate the RAM/disk speed differences too.
This program seems like an almost ideal example of the worst case scenario for branch prediction. I'd love to see him run this benchmark on something like ARM's in-order A55 or the recently-announced A525. I'd guess those miniscule in-order cores at 2-2.5GHz would be 40-50% the performance of his Golden Cove setup.
lookmeat@reddit
Yup, the problem is simple: there was a point, a while ago actually, where adding more silicon didn't do shit because the biggest limits were architectural/design issues. Basically x86 (both 64 I bit and non-64 bi) hit its limits ~10 years ago at least, and from there the benefits become highly marginal, instead of exponential.
Now they added new features that allow better use of the hardware and skip the issues. I bet that code from 15 years ago, if recompiled with modern compilers would get a notable increase, but software compiled 15 years ago would certainly follow the rules we see today,
ARM certainly allows an improvement. Anyone using a Mac with an M* cpu would easily attest for this. I do wonder (as personal intution) if this is fully true, or just the benefit of forcing a recompilation. I think it also can improve certain aspects, but we've hit another limit, fundamental to von newman style architectures. We were able to exgtend it by adding caches on the whole thing, in multiple layers, but this only delayed the inevitable issue.
At this point the cost of accessing RAM dominates CPU issues so much that as soon as you hit RAM in a way that wasn't prefetched (which is very hard to prevent in the cases that keep happening) the cost of accesing RAM dominates so much compared to CPU that it matters. That is if there's some time
T
between page fault interrupts in a thread program the cost of a page fault is something like100T
(assuming we don't need to hit swap memory), the CPU speed is negligible compared to how much time is just waiting for RAM. Yes you can avoid this memory hits, but it requires a careful design of code that you can't fix at compiler level alone, you have to write the code differently to take advantage of this.Hence the issue. Most of the hardware improvements are marginal instead, because we're stuck on the memory bottleneck. This matters because sofftware has been designed with the idea that hardware was going to give exponential improvments. That is software built ~4 years ago is thought to run 8x faster, but in reality we see improvments to only ~10% of what we saw the last similar jump. So software feels crappy and bloated, even though the engineering is solid, because it's done with the expectation that hardware alone will fix it. Sadly it's not the case.
theQuandary@reddit
I believe the real ARM difference is in the decoder (and eliminating all the edge cases) along with some stuff like looser memory.
x86 decode is very complex. Find the opcode byte and check if a second opcode byte is used. Check the instruction to see if the mod/register byte is used. If the mod/register byte is used, check the addressing mode to see if you need 0 bytes, 1 displacement byte, 4 displacement bytes, or 1 scaled index byte. And before all of this, there's basically a state machine that encodes all the known prefix byte combinations.
The result of all this stuff is extra pipeline stages and extra branch prediction penalties. M1 supposedly has a 13-14 cycle while Golden Cove has a 17+ cycle penalty. This alone is a 18-24% improvement for the same clockspeed on this kind of unpredictable code.
Modern systems aren't Von Neumann where it matters. They share RAM and high-level cache between code and data, but these split apart at the L1 level into I-cache and D-cache so they can gain all the benefits of Harvard designs.
"4000MHz" RAM is another lie people believe. The physics of the capacitors in silicon limit cycling of individual cells to 400MHz or 10x slower. If you read/write the same byte over and over, the RAM of a modern system won't be faster than that old Core 2's DDR2 memory and may actually be slower in total nanoseconds in real-world terms. Modern RAM is only faster if you can (accurately) prefetch a lot of stuff into a large cache that buffers the reads/writes.
A possible solution would be changing some percentage of the storage into larger, but faster SRAM then detect which stuff is needing these pathological sequential accesses and moving it to the SRAM.
At the same time, Moore's Law also died in the sense that the smallest transistors aren't getting much smaller each node shrink as seen by the failure of SRAM (which uses the smallest transistor sizes) to decrease in size on nodes like TSMC N3E.
Unless something drastic happens at some point, the only way to gain meaningful performance improvements will be moving to lower-level languages.
lookmeat@reddit
A great post! Some additions and comments:
The last part is important. Memory models are important because they define how consistency is kept across multiple copies (on the cache layers as well as RAM). Being able to losen the requirements means you don't need to sync cache changes at a higher level, nor do you need to keep RAM in sync, which reduces waiting for slower operations.
Yes, but nowadays x86 gets pre-decoded into microcode/microops, which is a RISC encoding, and has most of the advantages of ARM, at least when code is running.
But yeah, in certain cases the pre-decoding needs to be accounted for, and there's various issues that makes things messy.
I think that the penalty comes from the how long the pipeline is (therefore how much needs to be redone). I think part of the reason this is fine is because the M1 gets a bit more flexibility in how it spreads power across cores, letting it run a higher speeds without increasing power consumption too much. Intel (and this is my limited understanding, I am not an expert on the field) instead, with no effient cores, uses optimizations such a longer pipelines so that the CPU is able to run "faster" (as in faster wallclock) at lower cpu hertz.
I agree, which is why I called them "Von Neumann style" but the details you mention on it being like a Harvard architecture at the CPU level have little matter here.
I argue that the impact from reading of cache is negligible in the long run. It matters, but not too much, and as the M1 showed there's space to improve things there. The reason I claim this is because once you have to hit RAM you get a real impact.
You are completely correct in this paragraph. You also need the CAS latency there. A quick search showed me a DDR5 6000Mhz with a CL28 CAS. Multiply the CAS by 2000, divide it by the Mhz, and you get ~9.3 ns true latency. DDR5 lets you load a lot of memory each cycle, but again here we're assuming you didn't have the memory in cache so you have to wait. I remember buying RAM and researching for the latency ~15 years ago, and guess what? RAM real latency was still ~9ns.
At 4.8Ghz, that's ~43.2 cycles that we're waiting. Now most operations take more than one cycle, but I think that my estimate of ~10x waiting is reasonable. When you consider that CPUs nowadays do more operations in one cycle (thanks to pipelines) then you realize that you may have something closer to 100x operations that you didn't do because you were waiting. So CPUs are doing less each time (which is part of why the focus has been on power saving, making CPUs that hog power to run faster are useless because they still end up just waiting most of the time).
That said for the last 10 years most people would "feel" the speed up, without realizing that it was because they were saving on swap memory. Having to access a disc, assuming from a really fast M2 SSD, would be ~10,000-100,000x of wait-time in comparison. Having larger RAM means that you don't need to push memory pages into disc, and that saves a lot of time.
Nowadays OSes will even "preload" disc memory into RAM, which reduces latency of loading even more. That said when running the program people do not notice the speed increase.
I argue that the increase is minimal. Even halving the latency would still have time being dominated by waiting for RAM.
I think that a solution would be to rethink memory architecture. Another is to expose even more "speed features" such as prefetching or reordering explicitly through the bytecode somehow. Similar to ARM's loser memory model helping M2 be faster, compilers and others may be able to better optimize prefetching, pipelining, etc. by having context that the CPU just wouldn't, allowing for things that wouldn't work for every code, but would work for this specific code because of context that isn't inherent to the bytecode itself.
Yeah, I'd argue that happened even before. That said, it was never Moore's law that "efficiency/speed/memory will double every so much", rather that we'd be able to double the amount of transistors in some space for half the price. There's a point were more transistors are marginal, and in "computer speed" we stopped the doubling sometime in the early 2000s.
I'd argue the opposite: high level languages are probable the ones that would be able to best take advantage of changes, without rewriting code. You would need to recompile. Low level languages you need to be aware of these details, so a lot of code needs to be rewritten.
But if you're using the same binary from 10 years ago, well there's little benefit from "faster hardware".
theQuandary@reddit
It doesn't pre-decode per-se. It decodes and will either go straight into the pipeline or into the uop cache then into the pipeline, but still has to be decoded and that adds to the pipeline length. The uop cache is decent for not-so-branchy code, but not so great for other code. I'd also note that people think of uops as small, but they are usually LARGER than the original instructions (I've read that x86 uops are nearly 128-bits wide) and each x86 instruction can potentially decode into several uops.
A study of Haswell showed that integer instructions (like the stuff in this application) were especially bad at using cache with a less than 30% hit rate and the uop decoder using over 20% of the total system power. Even in the best case of all float instructions, the hit rate was just around 45% though that (combined with the lower float instruction rate) reduced decoder power consumption to around 8%. Uop caches have increased in size significantly, but even 4,000 ops for Golden Cove really isn't that much compared to how many instructions are in the program.
I'd also note that the uop cache isn't free. It adds its own lookup latencies and the cache + low-latency cache controller use considerable power and die area. ALL the new ARM cores from ARM, Qualcomm, and Apple drop the uop cache. Legacy garbage costs a lot too. ARM reduced decoder area by some 75% in their first core to drop ARMv8 32-bit (I believe it was A715). This was also almost certainly responsible for the majority of their claimed power savings vs the previous core.
AMD's 2x4 decoder scheme (well, it was written in a non-AMD paper decades ago) is an interesting solution, but adds way more complexity to the implementation trying to track all the branches through cache plus potentially bottlenecking on long code sequences without any branches for the second decoder to work on.
That is partially true, but the clock differences between Intel and something like M4 just aren't that large anymore. When you look at ARM chips, they need fewer decode stages because there's so much less work to do per instruction and it's so much easier to parallelize. If Intel needs 5 stages to decode and 12 to for the rest of the pipeline while Apple needs 1 stage to decode and 12 for everything else, the Apple chip will be doing the same amount of stuff in the same amount of stages at the same clockspeed, but with a much lower branch prediction penalty.
RISC-V has hint instructions that include prefetch.i which can help the CPU more intelligently prefetch stuff.
Unfortunately, I don't think compilers will ever do a good job at this. They just can't reason welenough about the code. The alternative is hand-coded assembly, but x86 (and even ARM) assembly is just too complex for the average developer to learn and understand. RISC-V does a lot better in this regard IMO though there's still tons to learn. Maybe this is something JITs can do to finally catch up with AOT native code.
The compiler bit in the video is VERY wrong in its argument. Here's an archived anandtech article from the 2003 Athlon64 launch showing the CPU getting a 10-34% performance improvement just from compiling in 64-bit instead of 32-bit mode. The 64-bit compiler of 2003 was pretty much at its least optimized and the performance gains were still very big.
The change from 8 GPRs (where they were ALL actually special purpose that could sometimes be reused) to 16 GPRs (with half being truly reusable) along with a better ABI meant big performance increases moving to 64-bit programs. Intel is actually still considering their APX extension which adds 3-register instructions and 32 registers to further decrease the number of MOVs needed (though it requires an extra prefix byte, so it's a very complex tradeoff about when to use what).
An analysis of the x86 Ubuntu repos showed that 89% of all code used just 12 instructions (MOV and ADD alone accounting for 50% of all instructions). All 12 of those instructions date back to around 1970. The rest added over the years are a long tail of relatively unused, specialized instructions. This also shows just why more addressable registers and 3-register instructions is SO valuable at reducing "garbage" instructions (even with register renaming and extra registers).
There's still generally a 2-10x performance boost moving from GC+JIT to native. The biggest jump from the 2010 machine to today was less than 2x with a recompile meaning that even the best-case Java code and updating your JVM religiously for 15 years would still have your brand new computer with the latest and greatest JVM running slightly slower than the 2010 machine with native code.
That seems like a clear case for native code and not letting it bit-rot for 15+ years between compilations.
IsThisNameGoodEnough@reddit
He released a video yesterday discussing that exact point:
https://youtu.be/veRja-F4ZMQ
HoratioWobble@reddit
It's also a mobile cpu vs desktop cpus which even if you ignore the throttling tend to be slower.
Ameisen@reddit
Is there a reason that everything needs to be a video?
ApertureNext@reddit
Because he makes videos and not blog posts.
littlebighuman@reddit
He does also write blog posts. The guy is actually quite a famous woodworker.
arvidsem@reddit
He switched almost entirely to videos for the last year or two. Apparently it's the only way to actually drive engagement now
agumonkey@reddit
ex mechanical engineer, the brain is real
littlebighuman@reddit
Yea worked at Blackberry
omegga@reddit
Monetization
Ameisen@reddit
I'm guessing that nobody enjoys posting informative content just to be informative anymore...
Monetizing it would certainly destroy the enjoyment of it for me.
lazyear@reddit
This is indeed the big difference with the old Internet. People used to do stuff just because they enjoyed it. That stuff still exists, but now it's drowned out by monetization
Ameisen@reddit
I had turned on the "donations" feature on a very large mod I'd written for a game.
The moment a donation was made ($10) I immediately declined it and disabled the donation feature.
It felt very wrong. I don't like making people pay for enjoying things I've done (I am a terrible businessman) but I also didn't like the feeling that it established a sense of obligation (more than I already felt).
I really, really don't like this new world of monetization. It makes me very uneasy and stressed.
Articunos7@reddit
Not sure why you are downvoted. I feel I'm the same like you. I don't like others paying me for enjoying my project's borne out of my hobbies
EveryQuantityEver@reddit
It's the attitude that, just because you're not interested in making this your job, that no one should be. If the two of your don't want to, that's great. But other people have decided that they'd rather make this kind of thing their job.
Articunos7@reddit
I never implied that. People can have donations and they do. I don't judge them
EveryQuantityEver@reddit
The person who started this thread absolutely was implying that, and judging them. That's why they were downvoted.
Glugstar@reddit
That's just your interpretation. I just understood that he was criticizing a societal trend, not the particular individuals.
Like you can criticize drug addiction without criticizing the people who have fallen victims to that addiction.
disasteruss@reddit
You didn’t imply that but the original commenter of this thread explicitly said it.
morinonaka@reddit
Yet, you have a day job as well, no? You have bills to pay. Getting paid for things you do is not bad. Even if it's a hobby. Of course giving away things for free is a generous thing to do as well :).
Ameisen@reddit
If I didn't have a "day" job (it's... just my job), I certainly wouldn't be making enough to survive - or even help - through video monetization of what I do or through donations, though.
Feeling obligations is when I don't want them - I already feel obligated to update my freeware and support it; I'd rather not pile a monetary responsibility onto my pride-based one. I'd rather see people actually enjoy what I do rather than have to pay for it (which would likely mean that nobody enjoys it).
ChampionshipSalt1358@reddit
I doubt the person you are responding to or the people who upvote him actually get what you are saying. They will never understand why you wouldn't just monetize it anyways. That is the depressing as fuck world we live in today. Most don't see it your way. They see you as some form of luddite.
Titch-@reddit
I resonate with this a little. I'd do the donation link but would want a big red flag to only donate if they can afford it, and its not needed, but just a nice to have. Then it would kinda put my mind at ease about the situation
Luke22_36@reddit
The algorithm doesn't favor people doing things for fun because the platforms get a cut of the monetization.
agumonkey@reddit
society and money spoils everything
Reductive@reddit
Where can I find more about reddit getting a cut of the ad money from youtube?
AreWeNotDoinPhrasing@reddit
Obviously Reddit’s “cut” relies on not only their algorithms but on YouTube’s as well.
Reductive@reddit
Oh that sounds interesting but im not sure it is so obvious to me! Do you mean reddit gets some money from youtube for tuning their algorithms to prefer links to the youtube domain?
Chii@reddit
and those people had an alternative income source, and the 'do stuff' was just a hobby.
But for the majority of content on the internet today, it is not a hobby but a source of income (directly or indirectly). In return, theres more content to be had (tho the quality might be somewhat lower, depending on your tolerance).
Over all, it is still better today than the past.
ChampionshipSalt1358@reddit
It absolutely is not better today overall. It is nearly impossible to find written tutorials or any sort of write up for hobbies anymore. It is all HEY GUYS BLAH BLAH BLASH SMASH MY BUTTON SO HARD PLEASE
Slime0@reddit
Because you started with an obviously false, rather whiney statement.
Ameisen@reddit
There's literally Unreal documention that links to a YouTube video for how to enable something. So, bullshit on it being "obviously false".
Slime0@reddit
You said "nobody." If there's a single person out there who enjoys posting informative content then your statement is wrong. There's obviously a lot more than one such person. Hence your statement is obviously wrong.
I'm not saying there isn't a problem with monetization, with too much content being in video format, etc. I'm not even disagreeing with your stance on the issue. But you asked why you got downvotes, so I told you. Sorry you don't like it?
Ameisen@reddit
https://en.wikipedia.org/wiki/Hyperbole
Slime0@reddit
I suggest you reread each sentence I wrote and consider that it stands regardless of the fact that you were exaggerating, and in fact the exaggeration was likely a *contributor* to the downvotes that, again, you asked a question about yet seem so unhappy to have received an explanation for.
Ameisen@reddit
I really don't think that your comment should be dignified with a reply.
SIeeplessKnight@reddit
I think it's more that people no longer have the attention span for long form textual content. Content creators are trying to adapt, but at the same time, user attention spans are getting shorter.
NotUniqueOrSpecial@reddit
Which is only a ridiculous indictment of how incredibly bad literacy has gotten in the last 20-30 years.
I don't have the attention span for these fucking 10 minute videos. I read orders of magnitude faster than people speak. They're literally not worth the time.
ShinyHappyREM@reddit
I often just set the playback speed to 1.25 or 1.5.
NotUniqueOrSpecial@reddit
You do understand that even one order of magnitude would be 10x, right?
Maybe someone out there can, but it would be literally impossible for me to listen at anything even close to the speed I can read.
SkoomaDentist@reddit
Fucking this. I'm not about to spend 10 minutes staring at the screen in the hopes that some rando is finally going to reveal the one minute of actual content they have that I'll miss if I lose my concetration for a bit.
SIeeplessKnight@reddit
I think the more insidious issue is that social media has eroded even our desire to read books. It's designed to hijack our reward circuitry in the same way that drugs do.
And I wish declining attention spans were the only negative side effect of social media use.
If even adults who grew up without social media are affected by it, imagine how much it affects the younger generation who grew up with it.
noir_lord@reddit
I’ve referred to it as weaponised ADHD when discussing the design trap of social media with my missus.
My boy struggles to focus and gets twitchy if there isn’t a screen force feeding pap at him constantly.
We are essentially running an uncontrolled experiment on our young to see what the net result is going to be, it would fill me with more horror if that was different to how we’ve parented as a species for at least a few thousand years though… :D
ChampionshipSalt1358@reddit
You are filled with horror you just are burying it deep down and trying to justify it with....that.
Good luck :D
NotUniqueOrSpecial@reddit
Yeah, it's an insidious mess. I consider myself lucky that whatever weird combo of chemistry is going on in my brain, I never caught the social media bug. Shitposting on Reddit in the evening is as bad as I get, and that's probably in part because it's still all text.
ChampionshipSalt1358@reddit
Yup. You cannot speed a video up fast enough while still making it possible to understand that can compete with how fast I can read.
Literacy has tanked in the last 20 years. I cannot believe how bad it has gotten. Just compare reddit posts from 12 years ago, it is like night and day.
condor2000@reddit
No. it is because it is difficult to get paid for text content
Frankly , I dont have attention span for most videos and skip info I would have read as text
EveryQuantityEver@reddit
This is his job, though. He wants to get paid.
Ameisen@reddit
I know nothing about him. I do know that more and more is being monetized all the time.
I really, really find "being a YouTuber" as a job to be... well, I feel like I'm in a bizarre '90s dystopian film.
EveryQuantityEver@reddit
No, you are just wanting to whine. Producing high quality video content is in fact work, and is a job.
Ameisen@reddit
If you say so.
dontquestionmyaction@reddit
What?
In what universe is it dystopian?
ChampionshipSalt1358@reddit
The one where you were born prior to the internet mattering at all.
dontquestionmyaction@reddit
Because people getting paid for creating things is dystopian?
I feel like the exact opposite is true, isn't it?
ChampionshipSalt1358@reddit
You are not capable of actually getting this so I am not going to bother. If you were capable of understanding why this might be dystopian I wouldn't be responding to this comment.
EveryQuantityEver@reddit
You don't have anything to get. Someone doing what they like for a job is not dystopian.
dontquestionmyaction@reddit
It just sounds like the most boring "waaaah it's not a real job because it's on the internet" take, to be entirely honest.
dontquestionmyaction@reddit
People want to get paid for making stuff. There is nothing dystopian about that, and I find the notion of calling someone's job fake even though people like their product completely hypocritical.
ChampionshipSalt1358@reddit
Lol it's like you are trying to prove my point.
RireBaton@reddit
Only if they don't have corporate overlords, which in a way I guess they still do on YouTube.
hissing-noise@reddit
I don't know about /u/Ameisen or this particular video influencer, but what rubs me the wrong way in the general case is:
superraiden@reddit
ok
Trident_True@reddit
Matthias is a woodworker, that is his job. He used to work at RIM which I assume is where the old code came from.
Blue_Moon_Lake@reddit
USA is obsessed with side hustle.
AVGunner@reddit
A lot of people struggle to make a good salary and pay their bills, but you become the devil if you monetize something on the internet you're good at it.
Ameisen@reddit
Or - outside of my valid concerns with the medium in question being used for this kind of content - I am also opposed to the rampant and nigh-ubiquitous commercialization and monetization of everything.
I don't know how old you are, but I did live through times where it wasn't nearly this bad.
EveryQuantityEver@reddit
People need to make money to eat. Outside of the whole "Capitalism" thing, I don't see how you can consider someone wanting to be paid for their work to be "deeply concerning".
Ameisen@reddit
The Ferengi in Star Trek are not intended to be aspirational.
Everyone should consider rampant commercialization and monetization of everything, including personal data, to be deeply concerning.
EveryQuantityEver@reddit
Nobody is claiming that. But doing this kind of thing? It takes money.
ceene@reddit
The internet has been shit for the last decade because of this.
You used to find random pages for a particular thing on which someone was extremely proficient and willing to share their knowledge.
You found blobs of people which just wanted to share their views on the world, or their travels around the world without shoving ads about any particular hotel or restaurant. It was genuine and you could tell so. If you saw a recommendation for a product you knew it was because it was a good product (or at least the poster thought so), not because it had a hidden affiliate link.
Nowadays you can't trust anything you see online, because everything that is posted is done so with intent of extracting money, not with the purpose of sharing information.
GimmickNG@reddit
One effect of a worsening economy is that monetization of everything becomes more acceptable.
Embarrassed_Quit_450@reddit
It's not just monetizing, it's choosing an inferior format for technical information because it's better at monetization.
Blue_Moon_Lake@reddit
Or maybe they shouldn't have to be struggling while already having a job and thus they don't monetize everything?
wompemwompem@reddit
Weirdly defensive take which missed the point entirely lol
farmdve@reddit
It's not just the USA. In most countries worldwide there is a social pressure to earn more.
AVGunner@reddit
Not everyone wants to sell their time for free, just because you want to doesn't mean everyone does.
Ameisen@reddit
Not everyone wants to live on Ferenginar. Deposit 1 strip of latinum.
EveryQuantityEver@reddit
Dude, you can still watch the video for free.
Ameisen@reddit
EveryQuantityEver@reddit
No, you're just wanting to whine. All the stuff you're asking for? It costs money.
MatthewMob@reddit
Most people on the internet used to. You must be too young to remember that.
Ameisen@reddit
I've read your comment. Please deposit 30¢.
blocking-io@reddit
In this economy?
Ameisen@reddit
Localized entirely within your kitchen?
alpacaMyToothbrush@reddit
Just how much money do you think some random nerd makes off of youtube views?
ChrisRR@reddit
Because he wants to
coadtsai@reddit
Easier to follow a few YouTube channels than having to keep track of a bunch of random blogs
(For me personally)
juhotuho10@reddit
I like watching videos?
crackanape@reddit
Because a video drags out 1 minute of reading into 15 minutes of watching.
Trident_True@reddit
Then don't watch it and move on. You don't need this information, nobody that gives a shit about performance is running modern code on decades old hardware. This is just an interesting curiosity.
crackanape@reddit
I understand that this particular video is not essential to anyone's life.
It's more a general gripe that changes in monetisation have made getting information much shittier by making us sit through long videos instead of reading quick half-pagers.
Ameisen@reddit
Because videos aren't a optimal - or appropriate - medium for all content.
A lot of content lately that's been forced into video form is effectively speech (that would often be better as text) and some of what are pretty much just screenshots or even videos of text.
And yes - you can transcribe a video.
Or... and this is actually far easier to do - you could make it text and images, and if you must have speech, use TTS.
BCarlet@reddit
He usually makes wood working videos, and has dipped into a delightful video on the performance of his old software!
lIlIlIIlIIIlIIIIIl@reddit
Yep, if we don't allow people to share in whatever medium they do please, they might just not share at all. If someone cares so much, they can do the work of turning into a blog post or something, but I'm just happy we got a video at all!
Enerbane@reddit
Some things are videos, some things are not videos. You can choose not to engage with content that is a video.
Scatoogle@reddit
Crazy, now extend that logic to comments
Enerbane@reddit
I did.
sebovzeoueb@reddit
Sometimes the thing I want to find out about only exists in video form because no one can be bothered to write articles anymore.
Milumet@reddit
Because no one owes you any free stuff.
EveryQuantityEver@reddit
You're somebody. Get to it.
sebovzeoueb@reddit
I don't publish that much stuff but when I do it's usually in text form
moogle12@reddit
My favorite is when I need just a simple explanation of something, and I can only find a video, and that video has a minute long intro
macrocephalic@reddit
And someone who is so poor at presenting that I end up having to read the closed captions anyway. So instead of a column of text, I have Speech-To-Text in video form - complete with all the errors.
sebovzeoueb@reddit
This is what I'm talking about
burntcookie90@reddit
So these folks should cater to your needs?
sebovzeoueb@reddit
Not necessarily, but I would like them to maybe come across this thread of people saying they don't like the video format and consider doing text.
Cogwheel@reddit
This makes no sense in this context. A video creator is creating a video with certain content. Are you now saying everyone who releases a video must also maintain a blog that covers everything their videos cover?
This is only a problem when a single/limited source of information releases by video only. E.g. product manuals, patch notes, etc.
This kind of content is not the problem.
bphase@reddit
Good thing we've almost gone full circle, and we can now have AI summarize a video and generate that article.
sebovzeoueb@reddit
Kinda like how we can turn a bunch of bullet points into a professional sounding email and the recipient can have it converted into bullet points... Yay?
Cogwheel@reddit
This is not one of those things. People have been reporting on the end of moore's law WRT single-threaded performance for ... decades now?
sebovzeoueb@reddit
I wouldn't know because I didn't watch the video
Ameisen@reddit
You don't say?
I can also choose to complain about the fact that more and more content - especially content that isn't best presented in video form - is being presented in video for..
lIlIlIIlIIIlIIIIIl@reddit
Is there a reason that you have to have this content in another format?
Ameisen@reddit
Let me make a five minute video responding to this comment.
6502zx81@reddit
TLDW.
mr_birkenblatt@reddit
The video investigates the performance of modern PCs when running old-style, single-threaded C code, contrasting it with their performance on more contemporary workloads. Here's a breakdown of the video's key points: * Initial Findings with Old Code * The presenter benchmarks a C program from 2002 designed to solve a pentomino puzzle, compiling it with a 1998 Microsoft C compiler on Windows XP [00:36]. * Surprisingly, newer PCs, including the presenter's newest Geekcom i9, show minimal speed improvement for this specific old code, and in some cases, are even slower than a 2012 XP box [01:12]. This is attributed to the old code's "unaligned access of 32-bit words," which newer Intel i9 processors do not favor [01:31]. * A second 3D pentomino solver program, also from 2002 but without the unaligned access trick, still shows limited performance gains on newer processors, with a peak performance around 2015-2019 and a slight decline on the newest i9 [01:46]. * Understanding Performance Bottlenecks * Newer processors excel at predictable, straight-line code due to long pipelines and branch prediction [02:51]. Old code with unpredictable branching, like the pentomino solvers, doesn't benefit as much [02:43]. * To demonstrate this, the presenter uses a bitwise CRC algorithm with both branching and branchless implementations [03:31]. The branchless version, though more complex, was twice as fast on older Pentium 4s [03:47]. * Impact of Modern Compilers * Switching to a 2022 Microsoft Visual Studio compiler significantly improves execution times for the CRC tests, especially for the if-based (branching) CRC code [04:47]. * This improvement is due to newer compilers utilizing the conditional move instruction introduced with the Pentium Pro in 1995, which avoids performance-costly conditional branches [05:17]. * Modern Processor Architecture: Performance and Efficiency Cores * The i9 processor has both performance and efficiency cores [06:36]. While performance cores are faster, efficiency cores are slower (comparable to a 2010 i5) but consume less power, allowing the PC to run quietly most of the time [06:46]. * Moore's Law and Multi-core Performance * The video discusses that Moore's Law (performance doubling every 18-24 months) largely ceased around 2010 for single-core performance [10:38]. Instead, performance gains now come from adding more cores and specialized instructions (e.g., for video or 3D) [10:43]. * Benchmarking video recompression with FFmpeg, which utilizes multiple cores, shows the new i9 PC is about 5.5 times faster than the 2010 i5, indicating significant multi-core performance improvements [09:15]. This translates to a doubling of performance roughly every 3.78 years for multi-threaded tasks [10:22]. * Optimizing for Modern Processors (Data Dependencies) * The presenter experiments with evaluating multiple CRCs simultaneously within a loop to reduce data dependencies [11:32]. The i9 shows significant gains, executing up to six iterations of the inner loop simultaneously without much slowdown, highlighting its longer instruction pipeline compared to older processors [12:15]. * Similar optimizations for summing squares also show performance gains on newer machines by breaking down data dependencies [13:08]. * Comparison with Apple M-series Chips * Benchmarking on Apple M2 Air and M4 Studio chips [14:34]: * For table-based CRC, the M2 is slower than the 2010 Intel PC, and the M4 is only slightly faster [14:54]. * For the pentomino benchmarks, the M4 Studio is about 1.7 times faster than the i9 [15:07]. * The M-series chips show more inconsistent performance depending on the number of simultaneous CRC iterations, with optimal performance often at 8 iterations [15:14]. * Geekcom PC Features * The sponsored Geekcom PC (with the i9 processor) features multiple USB-A and USB-C ports (which also support video output), two HDMI ports, and an Ethernet port [16:22]. * It supports up to four monitors and can be easily docked via a single USB-C connection [16:58]. * The presenter praises its quiet operation due to its efficient cooling system [07:18]. * The PC is upgradeable with 32GB of RAM and 1TB of SSD, with additional slots for more storage [08:08]. * Running benchmarks under Windows Subsystem for Linux or with the GNU C compiler on Windows results in about a 10% performance gain [17:32]. * While the Mac Mini's base model might be cheaper, the Geekcom PC offers better value with its included RAM and SSD, and superior upgradeability [18:04].
lolwutpear@reddit
If AI can get us back to using text instead of having to watch a video for everything, this may be the thing that makes me not hate AI (as much).
I still have no way to confirm that the AI summary is accurate, but maybe it doesn't matter.
BlackenedGem@reddit
It's notoriously unreliable
SLiV9@reddit
TLDR
safrax@reddit
Please don't post this AI garbage. I know you're trying to be helpful but this crap doesn't do anything to help anyone, especially if/when it contains inaccuracies.
mr_birkenblatt@reddit
What's inaccurate?
AreWeNotDoinPhrasing@reddit
I wonder if you can have Gemini remove the ads from the read. I bet you can… that’d be a nice feature.
mr_birkenblatt@reddit
I haven't had a chance to watch the video yet. Are those ads explicit or is it just integrated in the script of the video itself? Either way the Gemini readout makes it pretty obvious when the video is just an ad
involution@reddit
I'm gonna need you to face time me or something mate, you're not making any sense
DanielCastilla@reddit
"Can you hop on a quick call?"
curious_s@reddit
I just twitched...
KrispyCuckak@reddit
I nearly punched the screen...
noir_lord@reddit
“Hey, you got a quick sec?”
Equivalent_Aardvark@reddit
Because this is a youtube creator who has been making videos for over a decade. This is his mode of communication.
There are plenty of other bloggers, hobbyists, etc but they are not presented to you in another format because you are honestly lazy and are relying on others to aggregate content for you. If you want different content, seek it out and you will find your niche. Post it here if you think that there's an injustice being done. You will see that there is simply not as big an interest in reading walls of text.
Implying Matthias is money hungry and somehow apart from other passionate educators is such a joke.
venustrapsflies@reddit
Accusing someone of being lazy for preferring to read instead of watch a video is certainly a take.
Ameisen@reddit
That's quite the arbitrary judgment.
I don't see how or where I implied that at all.
axonxorz@reddit
lol, nice cowardly way to get the last word.
14u2c@reddit
I don't know much about you either, but I do know that your only contribution to this thread is sharing low effort complaints. Comment on the actual content or move on.
__Nerdlit__@reddit
As a predominately visual and auditory learner, I like it.
Ameisen@reddit
As opposed to...?
You generally learn better via auditory or via visual sources.
I'm not sure how one could be predominantly both, unless you just don't have a preference.
tsimionescu@reddit
Well, the gold standard in educational content are university courses and seminars, which tend to be much more similar to a video than to a blog post.
New-Anybody-6206@reddit
I've seen people argue that learning via reading is somehow always a superior method, and that people who don't do that are artificially limiting themselves.
But I tend to dismiss most black-and-white opinions I see from people.
Cyral@reddit
God this site is annoying
Redditor 1: I prefer watching videos
Redditor 2: Here's why you are wrong
crysisnotaverted@reddit
The fact that you don't understand that being a visual learner means utilizing diagrams and visualizations of concepts instead of just being 'visible text', tells me a lot about you being a dumb pedant.
Using your example, a visual learner would benefit from screenshots of the Unreal editor UI with arrows and highlights pointing to specific checkboxes.
jarrabayah@reddit
There is no such thing as a visual etc learner anyway, it's been known to be a complete myth for decades. Studies show that all humans benefit most from mixed content types regardless of individual preference.
Ameisen@reddit
Tell me, what subreddit is this?
Those people would be far more likely to be designers than programmers.
The same people that Unreal blueprints were designed for.
crysisnotaverted@reddit
What the actual fuck are you talking about?
Where on God's green earth is being pedantic a disability? Are you thinking of a different word‽ https://www.merriam-webster.com/dictionary/pedant
It's the same as saying you have a propensity for splitting hairs or nitpicking. You are literally proving my point, dude.
Ameisen@reddit
I'd strongly suggest that you work on your reading comprehension. I'm speaking very deliberately and explicitly. I cannot (and will not), try to clarify further.
I did not say that it was.
Asperger's Syndrome
And, as I very explicitly said, using the term so readily and as what is intended as an insult only really started in the mid-2010s. Seems to be largely a generational term in that regard. I rarely saw it used before then.
Also, ending things with "[my] dude", though that's also dialectal.
crysisnotaverted@reddit
I used the term pedant because you were nitpicking and also wrong by telling that guy that reading text was the same as being a visual learner, when that isn't what that means.
Breaking out the DSM5 to try to school me on a disability I have because I used an exceedingly common word is something I've never quite seen before. I have never heard pedant be used as an insult specifically because they were autistic, and I've been at the receiving end of everything from SPED to spaz.
If I say you're compelled to do something, I'm not saying you have obsessive compulsive disorder. There's this weird crybullying thing going on here because you don't want to address my actual complaint. I am insulting you, not because you have a disabilty, but because you jumped up that guy's ass while being wrong and act like a prick.
ketosoy@reddit
Paste the video into Gemini and ask for a summary/transcript
Ameisen@reddit
So the solution to people over/misusing a medium is to rely on... another technology that is being overhyped and misused...
ketosoy@reddit
You don’t want to watch a video? you can get a summary from Gemini. You don’t want to use AI, then I can’t help you. I guess just don’t consume the information then.
Different people prefer to communicate and consume media and technology differently, your preferences are just that.
I personally like some content in YouTube videos, I can watch/listen to them while I’m doing rote tasks.
retornam@reddit
LLMs are known to hallucinate and I wouldn’t blindly trust them to accurately summarize material I have no idea about.
Read this to see how an author asked an LLM to summarize multiple blog posts and it make up stuff in every one of those summaries
https://amandaguinzburg.substack.com/p/diabolus-ex-machina
ketosoy@reddit
Absolutely, they can, and do hallucinate. They can and do get things wrong.
But, I don’t think we should hyper focus on hallucination errors. They are just a kind of error.
Humans make mistakes when transcribing, thinking, etc too. Even with doctors we get second opinions.
I think the primary metric we should be looking at is true information per hour.
Obviously, certain categories (like medicine) require more certainty and should be investigated thoroughly. But, other things, like a YouTube video summary, are pretty low stakes thing to get summarized.
retornam@reddit
So why trust it blindly then when you know it could be feeding you absolute lies?
ketosoy@reddit
I never proposed and would not propose trusting it blindly.
I measure true information per hour with LLMs the same way I do with humans: classifying which information needs to be true, checking against my mental models, and verifying to varying levels depending on how important the information is.
Once you get your head around “computer speed, human-like fallibility ” it’s pretty easy to navigate.
When true information matters, or you’re asking about a domain where you know the LLM has trouble, adding “provide sources” and then checking the sources is a pretty useful trick.
retornam@reddit
Simple question: how do you validate an LLM has correctly summarized the contents of a video correctly without knowing the contents of the said video beforehand?
Please explain the steps to perform such validations in simple English.
Thank you.
ketosoy@reddit
You’re asking the wrong question.
retornam@reddit
We’re not discussing human summaries here because no one mentioned a human summarizing a video.
The question remains: how can we validate that an LLM-generated summary is accurate and that we’ve been provided the correct information without prior knowledge of the material?
You made the suggestion, and you should be able to defend it and explain why when asked about it.
ketosoy@reddit
I have explained why I think LLMs should be judged by human truth standards not classical computer truth standards.
You’re seemingly insisting on a standard of provable truth, which you can’t get from an LLM. Or a human.
You can judge the correctness rate of an LLM summary the same way you judge the correctness rate of a human summary - test it over a sufficiently large sample and see how accurate it is. Neither humans nor LLMs will get 100% correct.
retornam@reddit
How do to test the sufficiently large sample size without manual intervention?
Is there a reason you can’t answer that question?
Lachiko@reddit
how do you validate the source material? whatever process you apply when you watch the video, you should apply to the summary as well. the video is likely a summary of other materials as well.
for a lot of videos it doesn't really matter, there is minimal consequences if the summary or source material is incorrect, it's insignificant. that's why you won't bother validating the video you're watching but have unreasonable expectations on the third hand interpretation.
ketosoy's point was clear and even you as a human struggled to comprehend it, lets not set unrealistic expectations for a language model when a lot of humans are no better.
ketosoy@reddit
It’s really unclear to me where this isn’t connecting. You test LLMs like you test humans. I never said you could do it without human intervention (I think that’s what you mean by manual)
This is approximately how we qualify human doctors and lawyers and engineers. None of those professions have 100% accuracy requirements.
Ameisen@reddit
I feel like you completely missed the meaning of my comment, particularly given what you've written and how... rather patronizing it is.
ketosoy@reddit
the normal thing to do here would be to explain what you think was missed.
kcin@reddit
There is a Transcript button in the description where you can read the contents.
myringotomy@reddit
It's to prevent the content from being searchable mostly.
Of course this is going to fail as AI learns to scrape video content too.
Cheeze_It@reddit
Yes. Money. People are broke and need to find more and more desperate ways to make money.
claytonbeaufield@reddit
this person is a well known youtuber. He's just using the medium he is known for... There's no conspiracy....
Ameisen@reddit
Where did I claim a conspiracy?
What is it with people showing up accusing me of:
?
I've quite literally done none of these things - I haven't commented on the author at all.
claytonbeaufield@reddit
I can read your mind, young padawan
Praelatuz@reddit
Its giving “im not touching you” vibes
websnarf@reddit
This guy is a semi-famous Youtuber who I've only known to "engage his audience" via video.
Embarrassed_Quit_450@reddit
Not a good one.
No_Mud_8228@reddit
Ohhh it's not a video for me. For when I suspect the video could be a blog post, I download the subtitles, parse them to be just text and then proceed to read it. Just a few seconds to get the info instead of 19 minutes.
Ameisen@reddit
Perhaps - if it doesn't already exist - someone could/should write a wrapper site for YouTube that automatically does this and presents it as a regular page.
No_Mud_8228@reddit
There are several, like https://notegpt.io/youtube-transcript-generator
Ameisen@reddit
This is still really bizarre to me.
A page with text and images is trivially turned into speech with TTS.
This is doing it the hard way for no benefit to the content itself (it's usually detrimental instead).
lolwutpear@reddit
Oh don't worry, there's an even bigger industry of taking text content, having an AI voice read it, and making a video out of that.
Articunos7@reddit
You can just click on the show transcript button and read the subtitles without downloading
KrispyCuckak@reddit
Dumbing down of society. People can't read anymore.
Supuhstar@reddit
Some folks find it easier, like us dyslexics
firemark_pl@reddit
Oh I really miss blogs!
retornam@reddit
Me too. I can read faster than to sit and watch full length videos.
We are here today ( multiple substack and videos) because everyone wants to monetize every little thing.
Ameisen@reddit
I miss GeoCities.
ketralnis@reddit
Nobody is making you watch it
bearicorn@reddit
Nobody is making my scratch my balls and sniff them too but here I am
CryptoHorologist@reddit
Nobody is making you reads comments questioning the format.
suggestiveinnuendo@reddit
a question followed by three bullet points that answer it without unnecessary fluff doesn't make for engagement
BogdanPradatu@reddit
It's annoying enough that the content is not written in this post directly.
Dismal-Detective-737@reddit
https://woodgears.ca/
It's the guy that wrote jhead: https://www.sentex.ca/\~mwandel/jhead/
ImNrNanoGiga@reddit
Also invented the PantoRouter
Dismal-Detective-737@reddit
Damn. Given his proclivity to do everything out of wood I assumed he just made a wood version years ago and that's what he was showing off.
Inventing it is a whole new level of engineering. Dude's a true polymath that just likes making shit.
arvidsem@reddit
If you are referring to the Panto router, he did make a wooden version. Later he sold the rights to the concept to the company that makes the metal one.
ImNrNanoGiga@reddit
Yea I knew about his wood stuff before, but not how prolific he is in other fields. He's kinda my role model now.
Dismal-Detective-737@reddit
Don't do that. He's going to turn out to be some Canadian Dexter if we idolize him too much.
alpacaMyToothbrush@reddit
There is a certain type of engineer that's had enough success in life to 'self fund eccentricity'
I hope to join their ranks in a few years
when_did_i_grow_up@reddit
IIRC he was a very early blackberry employee
arvidsem@reddit
Yeah, somewhere in his site are pictures of some of the wooden testing rigs that he built for testing BlackBerry pager rotation.
Here it is: https://woodgears.ca/misc/rotating_machine.html
And a whole set of pages about creatively destroying BlackBerry prototypes that I didn't remember: https://woodgears.ca/cannon/index.html
Kok_Nikol@reddit
It's usually good timing and lots of hard work. I hope you make it!
Dismal-Detective-737@reddit
I originally found him from the woodworking. Just thought he was some random woodworker in the woods. Then I saw his name in a man page.
He got fuck you money and went and became Norm Abrams. (Or who knows he may consult on the side).
His website has always been McMaster Carr quality. Straight, to the point, loads fast. I e-mailed if he had some templating engine. Or Perl script or even his own CMS.
Nope, just edited the HTML in a text editor.
pier4r@reddit
the guy wrote a tool to test wood, if you check the videos is pretty neat.
Narase33@reddit
Also made a video about how you actually get your air out of the window with a fan. Very useful for hot days with cold nights.
scheppend@reddit
lol that's also why I recognized this guy
https://youtu.be/1L2ef1CP-yw
pier4r@reddit
this sounds like a deal
agumonkey@reddit
what a superb page
14u2c@reddit
Also had a key role in developing the Blackberry.
sarevok9@reddit
"Single threaded code"
CPU Threads haven't gotten faster in about \~15 years, so it makes sense that he doesn't see performance increases, all the incremental gains have been in core count moreso than clock. I suppose you do have more cache / ram nowadays as well, but those are rarely the bottleneck in execution timing.
freecodeio@reddit
don't remember seeing a 6ghz cpu 15 years ago
Blecki@reddit
Have you seen one today?
There were clock speeds faster than you realize in 2010 and the vast majority are slower than you realize today. The best on the market is less than 2x faster than chips were 20 years ago. Most chips aren't the 9800x3d. 3ghz isn't unreasonable.
My expensive work laptop runs at... 2ghz.
Toastti@reddit
It's really less about the raw speed nowadays and more about the IPC. Or instructions per clock. A modern processor like the AMD 9950x3d running at a slower 3GHz would absolutely run laps around a 15 year old processor running at 5Ghz
Blecki@reddit
It depends. In a single, single threaded process? Only if it's written in a way that allows it to be run in parallel. That processor running multiple instructions at once still can't run an instruction that depends on the results of another instruction until that first one finishes.
NotUniqueOrSpecial@reddit
They literally can and do.
The most common technique is branch prediction.
It's a technology/approach dating back to the '50s.
There are fancy better things now, obviously, like branch runahead.
So, no, what you said is just...not right.
Blecki@reddit
Branch prediction is an entirely different case and only works at all because there's a finite number of possible states. It does not run dependent operations out of order (I should not have to explain how breaking the laws of causality is impossible) it instead guesses which path a branch will take and starts on it before the instructions that decide which branch actually should run finish. Runahead is the same idea except it runs both branches.
Branch prediction works when which instruction is dependent on another instruction. It does not work when the values the instructions works on are dependent.
NotUniqueOrSpecial@reddit
That is literally what out-of-order execution does at the instruction level by interleaving execution chains with ready data in the decode/translate/execution pipeline with those awaiting data.
You prevent literal stalls in the pipeline by computing partial results using the values that are ready, and then reorder instructions to make it look like it never happened at all.
What?
No.
That's absolutely not what runahead is. What you just described is speculative execution.
Runahead is a massively more complicated analysis of data-dependent chains of calculations, specifically to address
Blecki@reddit
Out of order execution works on independent instructions. Sorry it still can't break causality.
What they are calling runahead (pick a different term seriously it's already taken) still doesn't actually run a single dependent instruction... look, you clearly don't know what this means. So here's some simple assembly:
add a b mult a c
Here the multiply cannot execute before the add finishes. It's impossible by all laws of physics.
Introducing a branch doesn't change that. Predicting the right branch doesn't change that. The mult instruction is dependent on the add instruction.
What branch prediction could do is, if this code was more complicated and maybe c was calculated or fetched from memory, it could get started on that. But it would still have to push the mult to the next cycle after the add at absolute minimum.
No amount of trickery will change the fact that it's physically impossible to multiply the value of a by something before the value of a is known. And this is why old code that's not written to be asynchronous can only be sped up so much by things like processors that run multiple instructions at once: at the end of the day, programs are just a series of mathematical operations and they need to be applied in the correct order to get correct results.
NotUniqueOrSpecial@reddit
I love how you keep repeating this like you think that's a good point or that I've argued that at all.
Yes, obviously to finish a calculation, you must have all the inputs by the end. But processors internally reorganize and reschedule the individual pieces that go into executing individual instructions in order to give the effective appearance that everything happened at once. That's the beauty and insanity of pipelining and execution chains.
Christ, it's wild how far some people will go to prove they have no fucking clue what they're talking about just to sound smart. Your idiot child "assembly" is irrelevant in a conversation where we're discussing microcode and the metamagic of processor data/instruction pipelines.
Damn, you got me. It's a real tragedy that all real-world applications are just doing one multiplication at a time. Oh, wait, no. A huge amount of work is actually a wild interleaving of multiple calculations that can be decomposed into independently resolvable chains of calculation and executed in parallel at the pipeline level.
I love the idea that some dipshit Redditor is criticizing actual computer scientists about the usage of words in peer-reviewed work. You're right that the term is "taken", in the sense that it's been used for a decade or more to describe a specific kind of speculative execution intended for resolution of interdependent calculations...but the fact you don't understand that what "they are calling runahead" is the exact thing they are talking about is telling.
I checked your post history, just in case, to be sure I wasn't mistaken in my take on your attitude and skill level. Turns out? I wasn't.
You're an arrogant asshole. And with what justification? Making games in C#?
Go back to shit-talking novice programmers where your experience and understanding gives you an advantage. I've spent the last 20 years writing kernel drivers and distributed storage systems for multi-billion dollar businesses. You're not going to get anywhere with me.
Blecki@reddit
Mate I know more than you and never claimed half the shit you want to grand stand about. Are you capable of reading comprehension?
It's a simple basic fact that these techniques have a ceiling.
Yeah I'm sure you are very smart but you sure can't understand a simple concept here.
NotUniqueOrSpecial@reddit
No, it was literally.
You:
And that's demonstrably, literally, unequivocably false. The instructions absolutely run at the same time because "an instruction" isn't the unit of work in a CPU; they aren't atomic. They are, in fact, a series of steps which can be interleaved.
This is something every undergrad C.S. student knows by their third year, so what's your excuse, other than refusing to just admit you misspoke?
Blecki@reddit
You're willfully ignoring the definition of dependent.
NotUniqueOrSpecial@reddit
No, you are obviously ignorant of how execution pipelines work under the covers and have no idea that things like operand forwarding exist for literally this purpose.
If that exceptionally simple example isn't clear enough for you to understand that data-dependent instructions (even at the level of a single arithmetic calculation) can be interleaved at the pipeline level because they are executed in multiple stages, then I guess there's no helping you.
Blecki@reddit
So the instruction stalls in the middle instead, great, that hair was worth splitting. Still can't do math without the numbers it's mathing.
NotUniqueOrSpecial@reddit
And nobody ever said it did.
You claimed it was impossible for processors to run instructions before all data was ready.
I pointed out that's not true, because they do run the sub-steps of an instruction at the same time in order to reduce (or completely prevent) pipeline stalls at lowest layers.
And then you turned into a malignant shitlord about it because you couldn't fathom the possibility you were wrong.
Yes, it was.
If you're gonna speak in absolutes on a very technical topic, be prepared to get called out on technicalities.
Blecki@reddit
Oh please. Tut tut. I, the great debator, used an absolute - no wait, it seems that ansolute held true after all.
All you've done is quibble over the definition of words. You're the comic book guy going achually. You've brought nothing to this discussion I didn't already know. Your semantic arguments are meaningless.
"Actually part of the instruction can run"
That's nice. Is it finished? Oh, gee.
Did anything fundamentally change?
No - we're just talking about microcode instead. Just one level of abstraction lower. But you demonstrated earlier that you're incapable of thinking abstractly when I compared it to optimizing compilers so, oh well.
Do I have to produce wiring diagrams to make a claim? Put a little asterisk with a foot note explaining how different materials in the cpu interact?
Ridiculous.
type_111@reddit
This is true for a single dependent chain, but program fragments generally comprise many such chains intertwined. Current processors have significantly larger reorder buffers than in the past so they can reach deeper into programs to find and make simultaneous progress on vastly more of these chains.
Blecki@reddit
Yes, see, that's covered under "written in a way that allows it to be run in parallel".
type_111@reddit
Which is tantamount to "any real program"; not a particularly insightful restriction.
Sprudling@reddit
The Pentium 4 had 3.8 GHz 21 years ago, and my 7800X3D now has 4.2 GHz.
I think it's safe to say that core clock speeds have plateaued a long time ago.
currentscurrents@reddit
They have indeed. Here's some trend data, with graphs: https://github.com/karlrupp/microprocessor-trend-data
Clock speed plateaued around 2006 because of heat management problems. Single thread performance stopped following Moore's law at that point; it's still going up, but more slowly.
Transistor count and parallel performance has continued to increase exponentially so far, but it's hard to make efficient use of it. Most software is still single-threaded.
LBPPlayer7@reddit
Intel even had plans for a 4GHz Pentium 4 that they scrapped in favor of pursuing dual-core CPUs like the Pentium 4 D and Core 2 family
pasture2future@reddit
Less complicated processors with simpler pipelines do allow for higher clock frequencies. So an older processor could actually run at a higher clock frequency than a modern processor
Blecki@reddit
Downvoters don't know how computers work. All the parallelization in the world won't save you from synchronous code.
Antagonin@reddit
oh but it does... ever heard of pipelining and superscalar execution? Every CPU nowadays runs 4-5 instructions in parallel... Also a reason why there is such a huge hit from branching
Blecki@reddit
See the other thread, seriously - why do I keep having to say this?
There's a limit to how much the processor can do before it runs into the hard reality of physics. Running instructions in parallel is possible when instructions don't depend on the results of another - in fact the hard part is figuring out which instructions it's safe to run at the same time.
Programs have to be written in a way that allows the processor to do this, and they often are not.
Aka: it fucking depends if you're getting a speed boost, and you still can't break causality to achieve it.
Antagonin@reddit
> in fact the hard part is figuring out which instructions it's safe to run at the same time
Depends on what you consider hard... There are known solutions to this - see Tomasulo algorithm.
> Programs have to be written in a way that allows the processor to do this, and they often are not.
It's job of compiler to make the code suitable, not program's job. Again countless number of known solutions - loop unrolling, branch elimination, specialized instructions. Hardware helps too.
4-5x speed up of superscalar architecture has been proven.
Blecki@reddit
It's not the programmers job to write an efficient program? Huh?
This is exactly the same as no amount of compiler optimizations saving you from picking the wrong algorithm.
There's no magic involved. Even the most modern super scaled pipelined architecture can stall.
Antagonin@reddit
In terms of programming, there's very little you can do to affect superscalar behavior. Compiler is going to reorder everything tp be as optimal as possible anyways. Unless you on MSVC, that's pure garbage.
>It's not the programmers job to write an efficient program? Huh?
Try doing sum 0 to n in a loop, you get constant time formula with -O2. Nevertheless the discussion is about superscalar architecture, not optimality of algorithms, completely different discussion.
>There's no magic involved. Even the most modern super scaled pipelined architecture can stall
Nobody's is denying that, but back to your original point.
>All the parallelization in the world won't save you from synchronous code.
More often than not, you will see parallel behavior on instruction level, so that in way is saving you from sequential code... Otherwise CPUs would be 2-3x slower than they are.
HaMMeReD@reddit
https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core2+Extreme+X7900+%40+2.80GHz&id=1028
2009, 1085 Single Thread Rating
Intel Core Ultra 9 285K Benchmark
2025, 5095 Single Thread rating.
So CPU Single threading has gotten about 5x faster in 15 years. At least based on Cpumark.
droptableadventures@reddit
First one you've got is a low power 44W TDP laptop CPU, second is a 250W TDP desktop CPU - not really a fair comparison.
Pick an early gen i7 desktop CPU for the first one instead - there was a bit of a bump in performance with the i7 but that was really it.
GamerY7@reddit
https://www.cpubenchmark.net/cpu.php?id=6281&cpu=Intel+Core+Ultra+7+258V here it is, this not only uses less power but also is ~4x faster(atleast in this benchmark)
HaMMeReD@reddit
Feel free to pick yourself, I didn't spend much time looking up what cpu's came out exactly 15 years ago.
Fidodo@reddit
It's a far cry from double every 2 years at the peak of the Moore's law logistical curve. Improvements have greatly slowed, but not to the extent the comment you're replying is saying.
XelNika@reddit
Moore's law is about transistor density, not performance.
thomasfr@reddit
I upgraded my desktop x86 workstation earlier this year from my previous 2018 one. General single thread performance has doubled since then.
blahblah98@reddit
Maybe for compiled languages, but not for interpreted languages, .e.g. Java, .Net, C#, Scala, Kotlin, Groovy, Clojure, Python, JavaScript, Ruby, Perl, PHP, etc. New vm interpreters and jit compilers come with performance & new hardware enhancements so old code can run faster.
Cogwheel@reddit
this doesn't contradict the premise. Your program runs faster because new code is running on the computer. That's not a new computer speeding up old code, that's new code speeding up old code. It's actually an example of the fact that you need new code in order to make software run fast on new computers.
caltheon@reddit
The premise is straight up wrong though. There are plenty of examples of programs and games that have to be throttled in order to not run too fast, and they were written in low level languages like C. I'm not going to bother watching a video that makes an obviously incorrect premise to see they caveat their statement with a thousand examples of when it's false.
Cogwheel@reddit
That hasn't been widely true since the early '90s. Games have been using real time clocks (directly or indirectly) for decades. Furthermore, games in particular benefit greatly from massively parallel workloads which is the exact opposite of what this video is talking about. Old games might run hundreds-to-thousands of times faster when you port their code to modern GPUs compared to their original software renderers.
But if you take, say, MS office 2007 and run it on a machine from 2025, the user experience will be pretty much the same on a computer from today as one from the time.
BlueGoliath@reddit
Uh no, even "modern" games(2010 or newer) are "sped up" with higher frame rates. It's just in very subtle ways that aren't immediately obvious.
Cogwheel@reddit
You've changed the subject. GP was referring to games that rely on the underlying timing of the CPU that failed to work correctly on faster computers.
Those games were controlling their pacing (as in how fast the actual game logic/simulation progresses compared to real time) using clocks whose rates were tied to CPU performance.
Since then, they have been using realtime clocks for that purpose and it is not relevant.
Games having higher frame rates is not the question. The question is whether single-threaded performance has improved on CPUs over time.
Can we please try to hold onto context for more than one comment?
caltheon@reddit
No I was not, I was using them as example that OLD CODE RUNS FASTER ON NEW COMPUTERS, which should be obvious to anyone capable of human thought.
Cogwheel@reddit
You're referring to a time period that is irrelevant to the point being made in the video that we're all discussing (or not, i guess?).
The time period where games didn't run correctly from one generation of computer to the next was around teh same time that moore's law was still massively improving single-threaded performance with every CPU generation.
This video is talking about how that trend flattened out.
caltheon@reddit
Go check and see a graph of Moore's Law....here I'll make it easy on you https://ourworldindata.org/moores-law It's almost as if it's still pretty much on track. Sure it's slowed down a bit, but barely. People's perception of computer speeds FEEL like it slowed down because as I mentioned earlier, developers stopped caring about optimization. Why bother when you have to ship now and computers will keep getting faster. The computers are faster, the software is just getting shittier.
cdb_11@reddit
I actually use daily a Zen2 machine and a Sandy Bridge laptop, so it's ~8 years apart, with all the same software. Zen2 obviously feels faster, which is nice of course, but it's not that much faster honestly, as far as single-threaded performance goes. From 8 years of progress I should expect a 16x improvement. While some of my own programs do indeed get faster, it's maybe like 2-3x (that I can vouch for that it's actually bound by the processor and memory speed -- I can't make a general fair comparison because hard drives are different). And in some others there is really not that much significant difference, maybe 1.1x at most or something. I will probably get a Zen5 processor, but I just want AVX-512 (maybe more cores would be nice too), and I don't really expect it to be that much faster for normal stuff.
Oh definitely, but I'm guessing the main source of improvement there (ignoring higher core count) will come from routines optimized for specific architectures. So not "old code running faster" -- it's running different code.
caltheon@reddit
Moore's law was never about raw single core processing speed though, nor about speed at all, it is only concerning the number of discrete components on the processor (nowadays pretty much equal to the transistor count)
Cogwheel@reddit
No but this thread was until y'all missed the point.
Single threaded performance used to track Moore's law. Now it doesn't. That's the whole point
caltheon@reddit
annnnd you are still wrong, but good job trying to double down on it
Cogwheel@reddit
Transistor density is not single-threaded performance. Most of the benefits of moore's law have been going into multiprocessing power.
BlueGoliath@reddit
You're on Reddit friend.
BlueGoliath@reddit
If an animation(and associated logic, like a bullet reloaded)is 3x as fast because the frame rate is at 300, is it not the same issue? Instead of CPU clocks we are just talking about framerates which depends on CPU performance.
Cogwheel@reddit
How is that relevant to the discussion?
BlueGoliath@reddit
CPU performance along with everything else speeds up old code?
Cogwheel@reddit
No one has suggested otherwise. The point is that cpu performance hasn't changed much.
caltheon@reddit
Hardly, anyone who says Office 2007 will be the same doesn't remember what it was like in 2007. There are significant times when the program is doing tasks that would bring up waiting signs, or just not do anything. Sure, they actual typing part is largely the same because you are the limiting factor, not the computer. If computers didn't matter for execution speed, we would all still be running 0086 chips
sireel@reddit
A 2007 pc was likely clocked at 1-2gHz on two cores. A pc today is often 3-4gHz, on 8 cores. Maybe 16. Even if we're using perfectly parallelised execution (lol, lmao even) that's not even 2 orders of magnitude.
If something is effectively using a GPU that's a different story, but user software in 2007 was not using the GPU like this, and very little is today either
Cogwheel@reddit
SSDs and RAM capacity were the bigger bottlenecks there. You would be surprised at how many things are still slow.
caltheon@reddit
You do realize "new computers" doesn't just mean CPU. The only reason new code is slow is because no-one bothers to optimize it like they did 20+ years ago.
Cogwheel@reddit
So you're just making a semantic nitpick over the title of the video?
K.
RICHUNCLEPENNYBAGS@reddit
I mean OK but at a certain point like, there’s code even on the processor, so it’s getting to be pedantic and not very illuminating to say
Cogwheel@reddit
Is it really that hard to draw the distinction at replacing the CPU?
If you took an old 386 SX and upgraded to a 486 DX the single-threaded performance gains would be MUCH greater than if you replaced an i7-12700 with an i7-13700.
RICHUNCLEPENNYBAGS@reddit
Sure but why are we limiting it to single-threaded performance in the first place?
Cogwheel@reddit
Because that is the topic of the video 🙃
throwaway490215@reddit
Now i'm wondering, if (when) somebody is going to showcase a program compiled to CPU microcode. Entire functions compiled and "called" using a dedicated assembly instruction.
vytah@reddit
Someone at Intel was making some experiments, couldn't find more info though: https://www.intel.com/content/dam/develop/external/us/en/documents/session1-talk2-844182.pdf
TimMensch@reddit
Funny thing is that only Ruby and Perl, of the languages you listed, are still "interpreted." Maybe also PHP before it's JITed.
Running code in a VM isn't interpreting. And for every major JavaScript engine, it literally compiles to machine language as a first step. It then can JIT-optimize further as it observes runtime behavior, but there's never VM code or any other intermediate code generated. It's just compiled.
There's zero meaning associated with calling languages "interpreted" any more. I mean, if you look, you can find a C interpreter.
Not interested in seeing someone claim that code doesn't run faster on newer CPUs though. It's either obvious (if it's, e.g., disk-bound) or it's nonsensical (if he's claiming faster CPUs aren't actually faster).
tsoek@reddit
Ruby runs as bytecode, and a JIT converts the bytecode to machine code which is executed. Which is really cool because now Ruby can have code which used to be in C re-written in Ruby, and because of YJIT or soon ZJIT, it runs faster than the original C implementation. And more powerful CPUs certainly means quicker execution.
https://speed.yjit.org/
xADDBx@reddit
I didn’t watch the video, but I’d guess the point it makes is that new computers have lots of features (increased parallelism; vectorization support; various optimized instructions; …), but none of those would help a (single-threaded) program that was compiled without those features.
Pure single-thread processor frequency has stayed relatively stable (or even went down in comparison to some models) over quite a few years.
RireBaton@reddit
So I wonder if it would be possible to make a program that analyses executables, sort of like a decompiler does, with the intent to recompile it to take advantage of newer processors.
voronaam@reddit
Java can be compiled. Look up GraalVM.
Python can be compiled. Check out Codon.
Pretty sure Kotlin has its own native compiler as well.
BlueGoliath@reddit
GraalVM is a niche that requires specialized handling to get working if it can work at all. There are some Java apps that cannot ever be compiled to GraalVM.
voronaam@reddit
Herevis me, running entire backend layer on it. GraalVM compiled Java is pretty fast and lean. It does take half an hour to compile though.
But the result us a native code - we only compile x86 and ARM though.
vytah@reddit
And sometimes GraalVM just gives up and your "native binary" is just a bundled JVM and original bytecode.
turudd@reddit
This assumes you:
A) always write in the most modern language style
B) don’t write shit code to begin with.
Hot path optimization can only happen if the compiler reasonably understands what the possible outcomes could be
KaiAusBerlin@reddit
So it's not about the age of the hardware but about the age of the interpreter.
cdb_11@reddit
"For executables" is what you've meant to say, because AOT and JIT compilers aren't any different here, as you can compile the old code with a newer compiler version in both cases. Though there is a difference in that a JIT compiler can in theory detect CPU features automatically, while with AOT you have to generally do either some work to add function multi-versioning, or compile for a minimal required or specific architecture.
nappy-doo@reddit
Retired compiler engineer here:
I can't begin to tell you how complicated it is to do benchmarking like this carefully, and well. Simultaneously, while interesting, this is only one leg in how to track performance from generation to generation. But, this work is seriously lacking. The control in this video is the code, and there are so many systematic errors in his method, that is is difficult to even start taking it apart. Performance tracking is very difficult – it is best left to experts.
As someone who is a big fan of Matthias, this video does him a disservice. It is also not a great source for people to take from. It's fine for entertainment, but it's so riddled with problems, it's dangerous.
The advice I would give to all programmers – ignore stuff like this, benchmark your code, optimize the hot spots if necessary, move on with your life. Shootouts like this are best left to experts.
RireBaton@reddit
I don't know if you understand what he's saying. He's pointing out that if you just take an executable from back in the day, you don't get as big of improvements by just running it on a newer machine. That's why he compiled really old code with a really old compiler.
Then he demonstrates how recompiling it can take advantage of knowledge of new processors, and further elucidates that there are things you can do to your code to make more gains (like restructuring branches and multithreading) to get bigger gains than just slapping an old executable on a new machine.
Most people aren't going to be affected by this type of thing because they get a new computer and install the latest versions of everything where this has been accounted for. But some of us sometimes run old, niche code that might not have been updated in a while, and this is important for them to realize.
nappy-doo@reddit
My point is – I am not sure he understands what he's doing here. Using his data for most programmers to make decisions is not a good idea.
Rebuilding executables, changing compilers and libraries and OS versions, running on hardware that isn't carefully controlled, all of these things add variability and mask what you're doing. The data won't be as good as you think. When you look at his results, I can't say his data is any good, and the level of noise a system could generate would easily hide what he's trying to show. Trust me, I've seen it.
To generally say, "hardware isn't getting faster," is wrong. It's much faster, but as he (~2/3 of the way through the video states) it's mostly by multiple cores. Things like unrolling the loops should be automated by almost all LLVM based compilers (I don't know enough about MS' compiler to know if they use LLVM as their IR), and show that he probably doesn't really know how to get the most performance from his tools. Frankly, the data dependence in his CRC loop is simple enough that good compilers from the 90s would probably be able to unroll for him.
My advice stands. For most programmers: profile your code, squish the hotspots, ship. The performance hierarchy is always: "data structures, algorithm, code, compiler". Fix your code in that order if you're after the most performance. The blanket statement that "parts aren't getting faster," is wrong. They are, just not in the ways he's measuring. In raw cycles/second, yes they've plateaued, but that's not really important any more (and limited by the speed of light and quantum effects). Almost all workloads are parallelizable and those that aren't are generally very numeric and can be handled by specialization (like GPUs, etc.).
In the decades I spent writing compilers, I would tell people the following about compilers:
RireBaton@reddit
I guess you missed the part where I spoke about an old executable. You can't necessarily recompile because you don't always have the source code. You can't expect the same performance gains on code compiled targeting a Pentium II when you run it on a modern CPU as if you recompile it and possible make other considerations to take advantage of it. That's all he's really trying to show.
nappy-doo@reddit
I did not in fact miss the discussion of the old executable. My point is that there are lots of variables that need to be controlled for outside the executable. Was a core reserved for the test? What about memory? How did were the loader, and dyn-loader handled? i-Cache? D-Cache? File cache? IRQs? Residency? Scheduler? When we are measuring small differences, these noises affect things. They are subtle, they are pernicious, and Windows is (notoriously) full of them. (I won't even get to the point of the sample size of executables for measurement, etc.)
I will agree, as a first-or-second-order approximation, calling
time ./a.out
a hundred times in a loop and taking the median will likely get you close, but I'm just saying these things are subtle, and making blanket statements is fraught with making people look silly.Again, I am not pooping on Matthias. He is a genius, an incredible engineer, and in every way should be idolized (if that's your thing). I'm just saying most of the r/programming crowd should take this opinion with salt. I know he's good enough to address all my concerns, but to truly do this right requires time. I LOVE his videos, and I spent 6 months recreating his gear printing package because I don't have a windows box. (Gear math -> Bezier Path approximations is quite a lot of work. His figuring it out is no joke.) I own the plans for his screw advance jig, and made my own with modifications. (I felt the plans were too complicated in places.) In this instance, I'm just saying, for most of r/programming, stay in your lane, and leave these types of tests to people who do them daily. They are very difficult to get right. Even geniuses like Matthias could be wrong. I say that knowing I am not as smart as he is.
RireBaton@reddit
Sounds like you would tell someone that is running an application that is dog slow that "theoretically it should run great, there's just a lot of noise in the system." instead of trying to figure out why it runs so slowly. This is the difference between theoretical and practical computer usage.
I also kind of think you are saying that he is making claims that I don't think he is making. He's really just sort of giving a few examples of why you might not get the performance you might expect when running old executables on a new CPU. He's not claiming that newer computers aren't indeed much faster, he's saying they have to be targeted properly. This is the philosophy of Gentoo Linux that you can get much more performance by running software compiled to target your setup rather than generic, lowest common denominator executables. He's not trying making as detailed and extensive claims that you seem to be discounting.
nappy-doo@reddit
Thanks for the ad hominem attacks. I guess we're done. :)
RireBaton@reddit
Don't be so sensitive. It's a classic developer thing to say. Basically "it works on my box."
remoned0@reddit
Exactly!
Just for fun I tested the oldest program I could find that I wrote myself (from 2003), a simple LZ-based data compressor. On an i7-6700 it compressed a test file in 5.9 seconds and on an i3-10100 it took just 1.7 seconds. More than 300% speed increase! How is that even possible when according to cpubenchmark.net the i3-10100 should only be about 20% faster? Well, maybe because the i3-10100 has much faster memory installed?
I recompiled the program with VS2022 using default settings. On the i3-10100, the program now runs in 0.75 seconds in x86 mode and in 0.65 seconds in x64 mode. That's like a 250% performance boost!
Then I saw some badly written code... The program outputs the progress to the console, every single time it wrote compressed date to the destination file... Ouch! After rewriting that to only output the progress when the progress % changes, the program runs in just 0.16 seconds! Four times faster again!
So, did I really benchmark my program's performance, or maybe console I/O performance? Probably the latter. Was console I/O faster because of the CPU? I don't know, maybe console I/O now requires to go through more abstractions, making it slower? I don't really know.
So what did I benchmark? Not just the CPU performance, not even only the whole system hardware (cpu, memory, storage, ...) but the combination of hardware + software.
arvin@reddit
Moore's law states that "the number of transistors on an integrated circuit will double every two years". So it is not directly about performance. People kind of always get that wrong.
https://newsroom.intel.com/press-kit/moores-law
Revolutionary_Ad7262@reddit
https://en.wikipedia.org/wiki/Dennard_scaling is a correct answer
braaaaaaainworms@reddit
I could have sworn I was interviewed by this guy at a giant tech company a week or two ago
dAnjou@reddit
Is it just me who has a totally different understanding of what "code" means?
To me "code" means literally just plain text that follows a syntax. And that can be processed further. But once it's processed, like compiled or whatever, then it becomes an executable artifact.
It's the latter that probably can't be sped up. But code, the plain text, once processed again on a new computer can very much be sped up.
Am I missing something?
RireBaton@reddit
This seems to validate the Gentoo Linux philosophy.
bzbub2@reddit
this post from last week says duckdb shows speedups of 7-50x as fast on a newer mac compared to a 2012 mac https://duckdb.org/2025/05/19/the-lost-decade-of-small-data.html
mattindustries@reddit
DuckDB is is one of the few products I valued so much I used it in production before v1.
jeffwulf@reddit
Then why does my old PC copy of FF7 have the minigames go at ultra speed?
bobsnopes@reddit
https://superuser.com/questions/630769/why-do-some-old-games-run-much-too-quickly-on-modern-hardware
KeytarVillain@reddit
I doubt this is the issue here. FF7 was released in 1997, by this point games weren't being designed for 4.77 MHz CPUs anymore.
IanAKemp@reddit
It's not about a specific clock speed, it's about the fact that old games weren't designed with their own internal timing clock independent from the CPU clock.
bobsnopes@reddit
I was pointing it out as the general reason, not exactly the specific reason. Several mini games in FF7 don’t do any frame-limiting, such as the second reply discusses as a mitigation, so they’d run super fast on much newer hardware.
jeffwulf@reddit
Hmm, so it speeds up the old code. Got it.
jessek@reddit
Same reason why PCs had turbo buttons
txmail@reddit
Not related to the CPU stuff, as I mostly agree and until very recently used a I7-2600 as a daily for what most would consider a super heavy workload (VM's, docker stacks, Jetbrains IDE etc.) and still use a E8600 on the regular. Something else triggered my geek side.
That Dell Keyboard (the one in front) is the GOAT of membrane keyboards. I collect keyboards, have more than 50 in my collection but that Dell was so far ahead of its time it really stands out. The jog dial, the media controls and shortcuts combined with one of the best feeling membrane actuations ever. Pretty sturdy as well.
I have about 6 of the wired and 3 of the Bluetooth versions of that keyboard to make sure I have them available to me until I cannot type any more.
NiteShdw@reddit
Do people not remember when 486 computers had a turbo button to allow you to downclock the CPU so that you could run games there were designed for slower CPUs at a slower speed?
mccoyn@reddit
I dare you to catch those bouncing babies with turbo on.
caltheon@reddit
yeah, this is a pretty stupid premise and video
Redsoxzack9@reddit
Strange seeing Matthias not doing woodworking
Trident_True@reddit
I keep forgetting that his old job was working at RIM.
NameGenerator333@reddit
I'd be curious to find out if compiling with a new compiler would enable the use of newer CPU instructions, and optimize execution runtime.
ziplock9000@reddit
It has done for decades. Not just that but new architectures
matjam@reddit
he's using a 27 yo compiler, I think its a safe bet.
I've been messing around with procedural generation code recently and started implementing things in shaders and holy hell is that a speedup lol.
AVGunner@reddit
It's the point though we're talking about hardware and not compiler here. He goes into compilers in the video, but the point he makes is from a hardware perspective the biggest increases have been from better compilers and programs (aka writing better software) instead of just faster computers.
For gpu's, I would assume it's largely the same, we just put a lot more cores in GPUs over the years so it seems like the speedup is far greater.
Bakoro@reddit
The older the code, the more likely it is to be optimized for particular hardware and with a particular compiler in mind.
Old code using a compiler contemporary with the code, won't massively benefit from new hardware because none of the stack knows about the new hardware (or really the new machine code that the new hardware runs).
If you compiled with a new compiler and tried to run that on an old computer, there's a good chance it can't run.
That is really the point. You need the right hardware+compiler combo.
Embarrassed_Quit_450@reddit
Most popular programming languages are single threaded by default. You need to explicitely add multi-threading to make use of multi-cores, which is why you don't see much speedup adding cores.
With GPUs the SDKs are oriented towards massively parellizable operations. So adding cores makes a difference.
matjam@reddit
well its a little of column A, a little of column B
the cpus are massively parallel now and do a lot of branch prediction magic etc but a lot of those features don't happen without the compiler knowing how to optimize for that CPU
https://www.youtube.com/watch?v=w0sz5WbS5AM goes into it in a decent amount of detail but you get the idea.
like you can't expect an automatic speedup of single threaded performance without recompiling the code with a modern compiler; you're basically one of the CPU's arms behind its back.
prescod@reddit
He does that about 5 minutes into the video.
Richandler@reddit
Reddit not only doesn't read the articles, they don't watch the videos either.
marius851000@reddit
If only there was a transcript or something... (hmmm... I may downloed the subtitles and read that)
jabbalaci@reddit
Stop whining. Insert the link in Gemini Pro and ask the AI to summarize the video. Done.
BlueGoliath@reddit
Reddit doesn't have the capacity to understand the material half the time.
Sage2050@reddit
I absolutely do not watch videos on reddit
Articles maybe 50/50
Beneficial-Yam-1061@reddit
What video?
Sufficient_Bass2007@reddit
Watch the video and you will find out.
mr_birkenblatt@reddit
Why give a view if you don't know whether it's going to be worth it? You can't take back the view
Sufficient_Bass2007@reddit
why write a comment on a video if he didn't see the video? Also the answer is yes, obviously.
WarOnFlesh@reddit
Are you running out of views?
mr_birkenblatt@reddit
I have limited time and youtube videos are all about garnering views. long gone are the times where people made videos because they were passionate about something. even in this video you have ads and product placement/recommendations. yeah, the person is not doing it because of their generous. they want money. me viewing it is giving them money and supporting the way they do videos. if you want things to change you have to change how you consume media
No-Replacement-3501@reddit
Nobody's reading that text wall. no time. Tldr?
Slugywug@reddit
Have you watched the video yet?
Slugywug@reddit
Failure to watch the video before commenting gets a tag of c**t
thebigrip@reddit
Generally, it absolutely can. But then the old pcs can't run the new instructions
mr_birkenblatt@reddit
Old pcs front fall into the category of "new computers"
KazDragon@reddit
It does, and we're probably talking a few percentage points here and there each computer version upgrade for the big languages.
But if you're still running your code single-threaded, then there's a massive amount of performance resources going to waste.
On top of that, if you're not offloading compute time to the GPU, then there's a massive amount of performance resources going to waste.
On top of that, if you're not offloading to the cloud, there's a e amount of performance resources going to waste.
Etc.
Simplicity is a virtue, so is all engineering tradeoffs.
pasture2future@reddit
Only id the problem size is sufficiently large enough and even less so for integer problems like CRC since processors are so good at integer execution
A compiler will vectorize and unroll loops for you… I get that it’s supposed to be a joke, but still…
pasture2future@reddit
Only id the problem size is sufficiently large enough and even less so for integer problems like CRC since processors are so good at integer execution
kisielk@reddit
Depending on the program it might, especially if the compiler can autovectorize loops
NoleMercy05@reddit
So what was my turbo button from the 90s for?
TarnishedVictory@reddit
Sure they do, in probably most cases where it's applicable.
Vivid_News_8178@reddit
fuck you
*installs more ram*
XenoPhex@reddit
I wonder if the older machines have been patched for heartbleed/spector/etc.
I know the “fixes” for those issues dramatically slowed down/crushed some long existing optimizations that the older processors may have relied on.
BlueGoliath@reddit
It was awhile since I last watched this but from what I remember the "proof" that this was true were horrifically written projects.
StarkAndRobotic@reddit
The is incorrect - many games which functioned normally on older processors became unplayably fast on newer computers. Anyone who has played games from the DOS era knows this. As the industry progressed they focused on FPS. So they executed differently.
StendallTheOne@reddit
The problem is that he very likely is comparing desktop CPUs against mobile CPUs like the one in his new PC.
Bevaqua_mojo@reddit
Remove sleep() commands
mccoyn@reddit
I once removed a sleep and the application froze.
cusco@reddit
However, old code runs fast on new computers
ninefourtwo@reddit
single threaded i believe
Embarrassed_Quit_450@reddit
Benchmarks > beliefs
Embarrassed_Quit_450@reddit
Downvoted, fuck videos on Reddit. Text or no deal.
Suspicious-Concert12@reddit
Ralph Recto?