Per Stenström on why we never actually replaced the Von Neumann architecture — and whether we ever will
Posted by WeBeBallin@reddit | programming | View on Reddit | 15 comments
Just interviewed Per Stenström — one of the most prominent computer architects to come out of Europe — and asked him about John Backus's 1977 Turing Award lecture – Backus (inventor of Fortran) coined the term "Von Neumann bottleneck":
That was 49 years ago. Every CPU we've built since has the same architecture.
Per's answer is that the bottleneck never went away — we just got extraordinarily good at hiding it. Cache hierarchies, prefetching, out-of-order execution, speculative execution, cache coherence: the entire post-1980s history of CPU innovation is a stack of workarounds that make the bottleneck invisible for typical workloads without actually removing it.
His take on why we haven't replaced the architecture is essentially legacy — the software ecosystem built on Von Neumann is so vast that migrating to anything fundamentally different would cost decades of investment. His sharper point is that Von Neumann isn't "right" in any absolute sense: the architecture has to be in harmony with the underlying technology, and semiconductors happen to support what Von Neumann needs.
The thread I really wanted his read on was whether we'll ever see a genuine shift away from Von Neumann, or whether AI just pulls another generation of workarounds out of us. After 40+ years in the field he's honestly skeptical. He gave phase change memory as a recent cautionary tale: non-volatile, high-density, performance-competitive with DRAM, Intel and Micron poured huge money into it — and it died because of legacy. Even when a clearly viable alternative shows up, the cost of changing everything built around the current architecture tends to win.
The candidates he treats seriously are processing-in-memory (compute units distributed inside the memory itself — though he was honest this might be Von Neumann with a better layout rather than a genuine break) and entirely new substrates like quantum, which are a different paradigm but probably won't replace classical for general-purpose work.
I’d love a take on this from anyone closer to AI accelerator design or new-substrate work.
Link to full conversation here:
programming-ModTeam@reddit
No content written mostly by an LLM. If you don't want to write it, we don't want to read it.
No-Performance-785@reddit
M1 is a Harvard chip dressed up as a Von Neumann architecture and it has proven that new architecture can do just as well while maintaining backward-compatibility.
ElderberryPrevious19@reddit
Many modern processors are closer to Harvard than Von Neumann so I think it's more a philosophical question than logical. Internally they tend to have separate paths for instructions and data with separate caches at level 1
currentscurrents@reddit
Harvard and Von Neumann are just minor variants on each other. They both suffer from the memory bandwidth bottleneck because memory is physically separate from compute.
When people talk about non-Von Neumann they usually mean radically different architectures like cellular automata, neural networks, etc.
aanzeijar@reddit
Wouldn't change anything, would it? We already have FPGAs as hybrid cellular. We have graphic pipelines to compute in parallel. Turns out both of them run again into the memory barriers once you try to do real stuff with them.
SkoomaDentist@reddit
Not to mention there have been a bunch of widely used cpus that use literal Harvard architecture, ie. separate address space for instructions and data. Many dsps and older mcus used it, with probably devices being counted in billions by now, so not exactly a niche use case either.
Harvard-lite, ie. separate buses and storage (not just separate caches) for instructions and data is probably used by over 90% of mcus.
gimpwiz@reddit
Plenty of brand new designs use separate program and memory spaces, too.
SkoomaDentist@reddit
Harvard-lite is absolutely common (it's almost an inevitable aspect of typical MCUs that use flash for code and sram for data) but what modern designs really use entirely separate address space for code and data?
That tends to easily lead to unpleasantness with C semantics if / when the code space is used for both code and fixed constant tables as pointer accesses then need to check whether the pointer points to ram or rom.
gimpwiz@reddit
That's a fair point, I may not have been precise enough. I'll need to review the details, I suspect you're right.
I'm working with RISC-V soft cores right now, and the one I'm using ... hmm. I coulda sworn the si-five core uses a separate BRAM for program space and memory. Well, it does, but the question is, are they fully separate spaces or not. I'll have to check. Thanks.
SkoomaDentist@reddit
They share the same address space. Separate address spaces doesn’t really help with modern transistor counts so the data vs code cross-switch can just be added at the bus interfaces. Eg. 0x0 - 0x10… would be sram, and 0x80.. - 0x90… would be flash and the cpu I vs D bus has direct fast parh to flash / sram respectively and everything else goes through common bus arbitrator that uses the highest bits of the address to select the actual memory side bus.
brunhilda1@reddit
Whatever happened to that "paper loop tape with multiple read-write head" architecture that was so big on Slashdot 10-15 years ago?
gimpwiz@reddit
Intel is willing to spend billions on a product like ... uh, did they call it x-point? I only remember the codenames. Anyways, they're willing to invest. But never actually commit long term. Everyone knows this so big customers don't buy Intel products that aren't CPUs, so Intel never gets traction, and kills products. This is a vicious cycle. I am not saying this is the sole cause of their failure on micron-partnered memory, but it certainly contributes. Pretty much the only non cpu product I have seen any real success with recently was the SSD line, which of course now is shut down too.
jplindstrom@reddit
Would love to listen to this as an actual podcast, but can't find one anywhere.
2h YouTube video isn't going to happen, unfortunately.
namezam@reddit
I wonder if the proliferation of AI would sway the adoption estimate from decades to years or less. If there’s one thing that AI coding excels at it’s converting one architecture to another. Every business decision, use case, edge case, ui, ux, is already in place, just lift and place, run the tests. No tests? Well that’s lucky, AI excels in that as well. Look at how wildly fast Bun was rewritten in Rust. Two weeks ago it was laughed at, last week it was called half-baked, two day ago it passed 99% test case, today it was merged.
It isn’t quite literally a silver bullet. Whatever its downfalls are now will be largely mitigated in the next few years, certainly within the timeframe of ramping up a new hardware architecture.
I truly believe ARM Windows was saved by AI. The adoption was slow, compatibility was hit or miss, Intel seized the initiative to build out a competitor architecture, by all measures the Snapdragon chip line should be dead dead. But somehow, over the last year or so the vast majority of my issues have been addressed. Not just my Microsoft and Qualcomm, but updates to existing apps that I had thought abandoned Windows ARM years ago.
lood9phee2Ri@reddit
Quite a lot of modern deep embedded stuff remains Harvard architecture. Though von Neumann sort of growing there too I suppose.
https://en.wikipedia.org/wiki/Harvard_architecture#Modern_uses_of_the_Harvard_architecture