Intel and AMD Form x86 Ecosystem Advisory Group to Accelerate Innovation for Developers and Customers

Posted by bizude@reddit | hardware | View on Reddit | 72 comments

[-]

the_dude_that_faps@reddit

The faster both AMD and Intel adopt technologies like X86S, APX and AVX10, the faster the software adopts it and the gap with ARM widens.

My guess this is a necessary thing in a world where the competition has a more flexible model for feature expansion.

[-]

turtlelover05@reddit

X86S

What would the real-world performance implications of nixing legacy support be? I can't really imagine it would be enough to bother with.

[-]

mito@reddit

There's a strooong Apple mentality among many on this sub...old = bad and must be cut off. Legacy compatibility be damned, although it's x86's biggest strength to begin with.

[-]

spazturtle@reddit

Nobody uses 16bit and 32bit mode. Once your computer has finished booting they are shut down and you computer runs in 64bit mode. 64bit mode supports 32bit applications.

[-]

turtlelover05@reddit

I use 16-bit mode all the time with otvdm.

[-]

BookinCookie@reddit

It would make verifying complex cores easier. That’s about it.

[-]

doscomputer@reddit

the gap literally couldn't be any wider, the only thing that even gives ARM hope for the desktop market is x86 translation layers.

No customers are asking for these weird instructions, if anything, we just want AVX3/512 to actually be consistent and unified. Instead, Intel and AMD both stated that they're going to choose the right sets for the right products anyways, so this whole skit is a big PR stunt.

[-]

the_dude_that_faps@reddit

To be fair, APX is more likely to have an impact on regular workloads over stuff like AVX512, which is very niche. The extra registers and the 3 operand encoding can be taken advantage of by just recompiling software or by managed languages with a runtime like Java, .NET, V8 on browsers, etc.

[-]

theQuandary@reddit

Gap widens?

The headliner features of APX are 3-register instructions and 32 registers. ARM has had these for 13 years now (since ARMv8 launched in 2011). RISC-V has had these since the start too.

X86s takes a very conservative approach to reducing legacy garbage. ARM did a far better version of this in 2011 when they released a completely redesigned (aka brand new) ISA then transitioned over. RISC-V rather obviously doesn't suffer from this issue either.

AVX10 is Intel's attempt to unify the mess they and AMD created for themselves with AVX, AVX2, and AVX-512. NEON doesn't have those issues and SVE2 (released back in 2021) also allows ARMv9 to scale wider than x86 packed SIMD meaning that x86 is actually still behind in this area. RISC-V has a vector extension like SVE2, so there's not much catching up to do there either (though there is some and RISC-V has been seriously looking at 48-bit instructions to enable even more registers and 4/5-register instructions which could put it ahead of everyone.

[-]

Falvyu@reddit

I agree that 'widening the gap' claims don't make much sense. The focus is moving towards more efficient CPUs, and that's were some ARM designs (i.e. Apple) excel.

However, I don't agree with a lot of your arguments.

The headliner features of APX are 3-register instructions and 32 registers. ARM has had these for 13 years now (since ARMv8 launched in 2011). RISC-V has had these since the start too. This is x86 trying to catch up.

True about the 32 registers. But 3-reigster instructions have been in x86 since at least AVX (2011).

NEON doesn't have those issues

NEON has the same issue (though, to a lesser extent).

SVE also created a mess: there's SVE, SVE2, and SSVE. Some machines have SVE (Fugaku). Others have both (Graviton 3 / Grace Hopper) and some others have SSVE and no SVE2 (Apple M4).

ARMv9 to scale wider than x86 packed SIMD meaning that x86 is actually still behind in this area.

In theory yes. In practice, only Fugaku has 512-bits SVE. All other implementations have 128-bits SVE registers.

RISC-V has a vector extension like SVE2, so there's not much catching up to do there either

RVV is fun do write code for, and SoC have been released earlier this year with RVV 1.0 support.

But while RVV should be decent on in-order cores, its implementation on OoO cores is going to be a nightmare due to the implied vector states. To me, it's as if they looked back at the RISC vs CISC debate and decided to go for the most CISC-looking method.

I'm once again left musing that if ISA makes absolutely no difference at all, then why go through all the trouble?

Usually, people say this to prolong the RISC vs CISC circlejerk. Yes, variable length instruction add complexity, and thus decrease power efficiency, but not as much as people think it does (i.e. it's not enough to completely explain the current gap with Apple). Other sets of mechanisms matter just as much, and these can either be from the ISA (as you says: number of named registers, SIMD, number of instruction operands), from the micro-architecture (branch prediction, paging, cache, ...) or even software/firmware support (scheduling, idle policy, ...).

The reason why Intel and AMD are going through the trouble of making an advisory group is because they want to keep consistent features between their chips.

Right now, only AMD supports AVX512 on their client CPU => software developers are thus going to target the common denominator, which is, at best, AVX2. If Intel wants software that leverage AVX10, APX as well as upcoming extensions, then it is in their best interest to ensure that AMD implements them. Otherwise, there's indeed a lot of leftover performance left on the table, which wouldn't be a wise idea with the current competition with ARM-based CPU (which, I think, we do agree on).

[-]

theQuandary@reddit

But 3-reigster instructions have been in x86 since at least AVX (2011).

98% of all x86 code isn't AVX, so that is essentially irrelevant for most code. Integer code is the overwhelming majority of code and Intel theirselves state outright that the extension is a massive win.

SVE also created a mess: there's SVE, SVE2, and SSVE.

This is a strawman. SVE was designed for HPC servers. SVE2 extended SVE so it could do everything that NEON does. Few chips ever used SVE and to my knowledge, none of those were consumer chips. All the modern core designs are ARMv9 which requires SVE2 meaning there's not a fragmentation issue. SSVE/SME isn't aimed at the exact same design space as SVE2 (though there is definitely overlap).

x86 has 17 different SIMD extensions and that's if you don't count all the dozen or so AVX-512 extensions. NEON, SVE, SVE2, and SME is a walk in the park in comparison.

In theory yes. In practice, only Fugaku has 512-bits SVE. All other implementations have 128-bits SVE registers.

This is completely orthogonal to the question of vectors vs packed SIMD.

But while RVV should be decent on in-order cores, its implementation on OoO cores is going to be a nightmare due to the implied vector states.

They'll predict it and move on. I believe this is already happening. Not the best solution, but pretty much a requirement until they agree to adding 48/64-bit instructions with enough bits to encode stuff statelessly.

Yes, variable length instruction add complexity, and thus decrease power efficiency, but not as much as people think it does

The only study I've ever seen on this topic showed Haswell decoders using a whopping 22% of total core power on integer workloads (the common workload type). If you're going to claim that it doesn't matter very much, at least provide some evidence.

Lion's cove is 4.53mm2 (without last-level cache) and M3 (on the same node) is 2.49mm2 (without last-level cache). Something is absolutely wrong with that picture.

If Intel wants software that leverage AVX10, APX as well as upcoming extensions, then it is in their best interest to ensure that AMD implements them.

For DECADES, using the Intel compiler and libraries would absolutely screw over AMD performance. Intel has a massive number of developers ensuring all the common compilers work well with their instruction sets. Because Intel controls a supermajority of the market, if they add those instructions, developers will use them. Further, there are already tons of runtime compatibility flags for basically every binary out there and that is a solved problem.

There is certainly something beyond that to this sudden change of heart.

[-]

Falvyu@reddit

98% of all x86 code isn't AVX, so that is essentially irrelevant for most code. Integer code is the overwhelming majority of code and Intel theirselves state outright that the extension is a massive win.

I thought you were referring to instructions in general. If you are referring to 3-operand scalar instruction then I mostly agree with the initial point.

As for the '98% of all x86 code isn't AVX' (i.e. AVX or AVX2): it's often the 2% that matters (video encoding/decoding, some image processing, simulation, 'scientific' computing, parsing).

This is a strawman. SVE was designed for HPC servers. SVE2 extended SVE so it could do everything that NEON does. Few chips ever used SVE and to my knowledge, none of those were consumer chips. All the modern core designs are ARMv9 which requires SVE2 meaning there's not a fragmentation issue. SSVE/SME isn't aimed at the exact same design space as SVE2 (though there is definitely overlap).

FEAT_SVE2 requires FEAT_SVE (cf. ARM documentation). As you say, SVE2 is indeed meant to do everything that NEON can do, and more (i.e. it's meant to eventually replace NEON). Regardless of its initial intent, SVE is meant to eventually be included in a wide span of devices (and nothing wrong with that).

Now, ARMv9 does not mandate SVE/2 (cf. ARM documentation).

And neither the Snapdragon 8 Gen 3, nor the Apple M4 chip support SVE2, even though both are ARMv9.2 (the latter only support Streaming SVE and SME). On the other hand, Google's Pixel 8 smartphone has SVE2. That's also the issue: you can't rely on ARMv9 alone to determine whether a chip has SVE/2 or not. Eventually, SVE/2 will be included in in all ARM chips. But for now, You have to do it on a case-by-case basis (or detect it automatically).

On top of it, SVE/2 has a few 'sub-extensions':

FEAT_SVE_AES
FEAT_SVE_BitPerm
`FEAT_SVE_PMULL128``
FEAT_SVE_SHA3
FEAT_SVE_SM4

I agree that these sub-extensions can be quite niche. Though, lots of AVX512 extensions are also niche as well (e.g. AVX512_VP2INTERSECT, AVX512_VAES, ....).

Speaking of AVX512, its worth highlighting that its mess is somewhat saved by the smaller diversity of implementation and can be split in two tiers: skylake and icelake.

Though, I do agree that AVX512 specifications remain more complex that they should be (Intel does too, hence their AVX10 plan). I'm just pointing out that the comparison to ARM is not as clear cut as what can be claimed.

This is completely orthogonal to the question of vectors vs packed SIMD.

I disagree.

Variable length SIMD registers don't matter as much if you stick to the same vector sizes, at least for a long duration of time.

x86 introduced 128-bits SIMD registers back in 1999. 25 years later, 128-bits is getting a bit small on x86 but seems okay on a lot of current ARM implementations (NEON or SVE). A 256-bits SIMD register size would probably last a while too, as seems to be a good 'general' fit. Whether variable-length SIMD is critical can be summed up as 'How often are okay with updating/compiling the code ?'.

Personally, as long as operations remains consistent across sizes (e.g. like between AVX10 sizes), I think once every 10 years can be acceptable, and I expect 256-bits to remain relevant until then (we'll see, I'd love to be proven wrong).

They'll predict it and move on. I believe this is already happening. Not the best solution, but pretty much a requirement until they agree to adding 48/64-bit instructions with enough bits to encode stuff statelessly.

The main issue with a stateless RVV: you still have to support the old extension to ensure backward compatibility.

I don't think state prediction is a great idea at all. They may have had specific workloads in mind, where it wouldn't be an issue, but the misprediction penalties may be steep in others. I think moving some of the complexity back to software, providing expressive intrinsics (even if they don't map 1-to-1 on existing instruction) and sticking with a SVE-styled SIMD paradigm is a better paradigm. But we'll see.

The only study I've ever seen on this topic showed Haswell decoders using a whopping 22% of total core power on integer workloads (the common workload type). If you're going to claim that it doesn't matter very much, at least provide some evidence.

First of all, 'integer workloads' is a broad term that can mean anything from finite-state machine, text parsing, some part of image processing, ...

I'm pretty sure I have seen the paper you refer to, in which case:

They don't just pick a random 'integer workload', they design a workload that they designed specifically to stress the instruction decoder. Such workload gives an upper boundary, but does not reflect common workloads.
ARM instruction decoding is likely significantly power-savvy than x86, but won't be 0%. Switching to fixed-size instructions wouldn't reduce CPU power consumption by 20%, but by slightly less.
They measure power consumption relative to that of the CPU. A SoC, where power consumption matters, will have other components such as RAM, an integrated GPU, a NPU, and other 'system' components (fan, hard drives/SSD, power supply, possibly a monitor). So even if the '20%' was realistic, that wouldn't translate into '20%' lower usage without recharging. As such, x86's variable length instruction decoding can still be seen as an acceptable tradeoff if it guarantees proper compatibility. Of course, things may change, but as said previously, I think x86 should remain quite relevant for the foreseeable future.

On top of this, I have found this paper from USENIX which also measures the power consumption of the x86 instruction decoding scheme. They find it to be closer to 5-10%, but, as I said previously, on specific microbenchmarks (quote "Nevertheless, we would like to point out that this benchmark is completely synthetic. Real applications typically do not reach IPC counts as high as this. Thus, the power consumption of the instruction decoders is likely less than 10% for real applications.").

For DECADES, using the Intel compiler and libraries would absolutely screw over AMD performance. Intel has a massive number of developers ensuring all the common compilers work well with their instruction sets. Because Intel controls a supermajority of the market, if they add those instructions, developers will use them. Further, there are already tons of runtime compatibility flags for basically every binary out there and that is a solved problem.

I agree with you over Intel screwing AMD. However, I think Intel has realized that this strategy won't work anymore:

Developers target the lowest-common denominator because developing and supporting multiple code paths can be difficult. That's one of the reasons why AVX512 provides limited benefits in games. That's also why Intel has made it clear in the AVX10 revision that they had no plan to support AVX10/128-only CPU (this post goes into more detail about this issue). It's also one of the major reasons behind variable-length SIMD vectors in SVE and RVV: you said it yourself, their size can scale without introducing new instructions. (As said previously, I'm less convinced about that specific part. 256-bits has been a sensible size for a while now, and most current SVE implementations only use 128-bits anyway).

[-]

Adromedae@reddit

Mate, x86 has had 3-register ops forever.

[-]

the_dude_that_faps@reddit

Arm is still not in a leading position in the data center. At all. And the gap is still very real. Unless you mean something different? Neither Graviton4 or Ampere one are close to Epyc at all.

[-]

theQuandary@reddit

Graviton4 is already very competitive in a lot of workloads. More importantly, it is cheap. A full graviton4 system is $4.308/hr while a matching 96-core AMD 4th gen EPYC system is $5.564/hr.

Most workloads are generic web server type applications. For these things, Graviton4 is practically the same performance level as EPYC while costing way less per year and that matters for developers.

On a server, legacy garbage doesn't matter at all and x86 doesn't matter either as all the tooling and infrastructure you need for 99.9% of tasks is already native to ARM.

[-]

the_dude_that_faps@reddit

Most workloads are generic web server type applications.

Well that is a generalization if I've ever seen one.

On a server, legacy garbage doesn't matter at all and x86 doesn't matter either as all the tooling and infrastructure you need for 99.9% of tasks is already native to ARM.

Not even when it was Epycs vs Xeon with both on the same ISA was this true. It is very easy to port over to ARM, but it is not as easy as not doing anything at all.

For these things, Graviton4 is practically the same performance level as EPYC while costing way less per year and that matters for developers.

The gap still exists. As much as you may want to diminish it, it still exists.

[-]

theQuandary@reddit

a generalization if I've ever seen one.

What is your experience with software development?

My current F500 company is a great example. We have tens to hundreds of thousands of server instances spun up. Our HPC and AI server usage is a rounding error when compared to all the instances we need to spin up for various user functionality and what is basically glorified CRUD stuff.

Not even when it was Epycs vs Xeon with both on the same ISA was this true. It is very easy to port over to ARM, but it is not as easy as not doing anything at all.

What do you think most developers do every day? The overwhelming majority of developers are writing in managed languages like Java, C#, JS, Ruby, Python, Go, PHP, etc, where the runtime and OS handle the hard hardware integration bits. Only a small subset of devs are writing code where the ISA matters a lot.

The gap still exists. As much as you may want to diminish it, it still exists.

The gap varies dramatically based on workload. Look at the Phoronix review. Some workloads you might be better off with EPYC. Other workloads have Graviton being straight-up faster and 23% cheaper too.

The fact that Graviton4 with Neoverse V2 (based on X3) is so competitive with EPYC should terrify AMD because Neoverse V3 based on X925 was announced early this year meaning we'll probably be seeing server chips using it in the next few months.

[-]

the_dude_that_faps@reddit

What is your experience with software development?

I've been doing it for 20 years at this point.

My current F500 company is a great example. We have tens to hundreds of thousands of server instances spun up. Our HPC and AI server usage is a rounding error when compared to all the instances we need to spin up for various user functionality and what is basically glorified CRUD stuff.

So all of that is just doing static content? no caching? No storage? No databases?

What do you think most developers do every day? The overwhelming majority of developers are writing in managed languages like Java, C#, JS, Ruby, Python, Go, PHP, etc, where the runtime and OS handle the hard hardware integration bits. Only a small subset of devs are writing code where the ISA matters a lot.

But you're ignoring platform enablement. For example, for opentelemetry collectors, the only platform with tier 1 support is linux/amd64. Anything arm64 is tier 2. and this is just one example off the top of my head.

And, as I said, are you using databases, caching, storage, observability, etc? Those things need enablement and optimization work too. Are you running CI? Is your CI setup for cross-compilation? Are you using python libraries that use compiled code? Go is compiled, btw.

> The gap varies dramatically based on workload. Look at the Phoronix review.

I don't think there's anything comparing zen 5 to graviton on phoronix. At least when I read their article s few days ago it only compared it to previous gen and Xeons. Regardless, the gap is there.

The fact that Graviton4 with Neoverse V2 (based on X3) is so competitive with EPYC should terrify AMD

Obviously. I'll even one up you, I think the battle with graviton is effectively lost. Regardless of performance. Amazon will always have a cost advantage. But graviton is AWS exclusive.

[-]

DerpSenpai@reddit

ARM has the lead in the actual architectures. Apple P Cores IPC is best in class, 2nd to ARM's X925. AMD and Intel are very much on the backfoot. ARM gets the yearly the IPC improvements that AMD produces in 2 years with a much fatter R&D budget

[-]

the_dude_that_faps@reddit

Apple is best in class. Not Arm. Not yet anyway. And that's as much a fact of Apple's prowess as it is a fact of Apple's decisions towards catering exclusively to their unique platform and the fact that they spend like crazy.

One example of this last point is that macos on Apple Silicone uses 16 KiB memory pages instead of 4 like most other OSs do. This means that, due to architectural considerations, they can pack 4 times as much L1 cache without increasing set-associativity vs x86 systems at the very least. And this is not an Arm thing, this is an Apple thing.

For comparison, here's a blog post discussing this change on Android: https://android-developers.googleblog.com/2024/08/adding-16-kb-page-size-to-android.html?m=1

It nets an extra ~10% performance increase on average just by increasing the memory page size. It would also net an increase in performance by allowing larger L1 cache sizes (when actually done). And again, this last bit is not an ARM thing, it's an Apple thing. Maybe it's an Oryon thing too, specifications around this has been hard to obtain. Specs seem to say that it has 96 KiB of L1 data cache 6 way set-associative, which would be consistent with a 16 KiB memory page. But Microsoft hasn't been very forthcoming with details and the only information I can find online points to windows also using 4 KiB memory pages for Windows on Arm.

Anyway, Apple dropped support for 4 kib memory pages, ARM hasn't to the best of my knowledge (I'm not up to the latest changes on their specifications). There's nothing stopping x86 CPUs from doing this except legacy compatibility that is also a factor for existing ARM devices. I mean, it supports 16 KiB memory pages but to net all of the advantages it would need to drop support for 4 KiB on the hardware.

Apple has been using 16 KiB memory pages in iOS since the transition from 32-bit to 64-bit. So this has been baked on for a long time, and this is a permanent boost in performance. And this is just one aspect of Apple's unique hardware design and platform. There is no server CPU based on Apple's design because they built something for their needs.

And this is just one thing Apple does differently just because they own everything. There are more that give them an edge on performance, efficiency or even both. Like using in package memory, or integrating the SSD controller on die, etc.

Also, IPC is a pretty irrelevant metric for comparing entirely different architectures, what matters is performance and that is a very nuanced conversation. It's even irrelevant for comparing AMD to Intel.

Then there's the fact that Apple (and therefore their customers) pay a premium just for the fact that Apple always has a node advantage vs AMD, Intel and other ARM designs. While on the datacenter, the conversation is different and that's why neither Graviton nor Ampere designs have dethroned Epyc.

ARM gets the yearly the IPC improvements that AMD produces in 2 years with a much fatter R&D budget

Is it really much fatter? That's very hard to say isn't it? Apple and Qualcomm both spend more in R&D than AMD, but they all do different things. AMD also does GPUs and, especially AI chips for the data center which is very likely where most of the spending was during the past few years. If you've seen the monster that MI300X is, you'd realize that designing that was probably veeeery expensive. ARM holding by itself has seen spending rise by over 70% this last year to almost 2 B and they don't even manufacture and sell chips and are just recently starting to target more than mobile and embedded.

I will give you, though, that Intel and AMD do probably spend more money than Apple or Qualcomm, or even ARM, on some things like testing and validation. Current x86 CPUs are expected to run software from ages ago without issue. That takes energy and time. But that tends to correct itself whenever an architecture becomes old enough.

Anyway, I think I've said enough to state my case. I also understand that this discussion is quickly veering into holy war territory and I don't really care much for that. The day I can purchase a PC with ARM or RISC V CPUs for the same price I can buy a PC with Intel or AMD hardware along with having equal or better performance and plug my own discrete GPU will be a very exciting day for me. I just also like the fact that x86 keeps evolving AMD hanging on.

[-]

TwelveSilverSwords@reddit

And this is just one aspect of Apple's unique hardware design and platform. There is no server CPU based on Apple's design because they built something for their needs.

Are you implying that Apple can't scale their architecture to make a server CPU?

[-]

the_dude_that_faps@reddit

I'm not implying anything. I'm saying Apple built what they wanted for their needs very specifically, whereas AMD uses the same basic CCD for everything from high performance laptops, to desktop parts, to workstations, to servers and also APUs like MI300A.

You don't scale to hundreds of cores without investing heavily in everything that is not the core. Somehow AMD has managed to outcompete everyone in this regard while not having the strongest core. Look at Lion Cove vs Zen 5 in laptops and then look at how bad Epyc beats Granite Rapids.

Can Apple do it? Sure. Are they doing it? No, they're not. And that's my point. They're minmaxing everything about their architecture for their use-cases. That's a very specific Apple advantage, not an ARM advantage.

[-]

TwelveSilverSwords@reddit

Absolutely, the ARM ecosystem is ina very strong position. This is why this "x86 Advisory group" is so important.

[-]

theQuandary@reddit

Can it be said that these things have been holding back x86 CPU IPC?

Yes.

2-register instructions mean that reusing a value requires an extra MOV instruction vs a 3-register instruction. Yes, it "goes away" during register renaming, but it blocks up L1, gives x86 even more instructions to decode, and adds pressure on the uop cache too.

Intel stated that moving from 16 to 32 registers would reduce total loads by 10% and stores by 20%.

Overall, Intel claimed a 10% reduction in total instructions. This breaks down into two things. On the L1 cache side, things aren't 10% better because each APX instruction must have an entire extra prefix byte (2 in some cases?). This means there are fewer instructions, but they are larger.

On the uop cache side of things, the situation seems better. Those prefix bytes go away and you are simply left with fewer overall instructions and the length is probably the same as before (I'd guess that 2-register instructions are changed to 3-register instructions for the sake of the renamer).

[-]

Rd3055@reddit

Interesting.

AMD and Intel do cross-license the x64 chip as we know it today.

The other industry players (Dell, Google, etc.) are obviously playing off x86 and ARM against each other.

But I have a question that perhaps someone here could answer: what exactly are they talking about when it comes to interoperability problems?

I thought that x86 or x64 software whether on Windows or Linux was binary compatible with Intel or AMD, and that the software would check the CPU type and if necessary use different code if, for instance, a certain CPU didn't have a certain instruction or something.

[-]

cafk@reddit

I thought that x86 or x64 software whether on Windows or Linux was binary compatible with Intel or AMD,

x86 and x86_64 are pretty straightforward - but addons like SSE 4.1 a/b/c, the newly fangled AVX10 have subsets of extensions(i.e. AVX-VNNI), that are not implemented at the same time or in a different manner, meaning you'll have different performance depending on the individual uArch.
I.e. intel lowering their clock speed during AVX2 workload for initial generations, AMD using 2xAVX2 to achieve AVX512 support.

and that the software would check the CPU type and if necessary use different code.

The software needs to be written in such a manner, often using a compiler (i.e. gcc and clang, where both Intel and AMD contribute to) produce different OS and x86 extension specific code with same optimization flags and this without the developers checking functionality on a multitude of generations of processors, they won't notice that the fallback code goes back to functionality of the 2000s.

I.e. gcc versus clang

Then with the current chiplets, big.Little architecture the scheduling requires the operating system to also be optimized for various behaviors or the turbo behavior (Intel's 13th and 14th gen i9, caused by a multitude of issues from cpu, os and mainboard shenanigans) - which has been a discussion point; or like Intel removing hyper threading in their current generation, which has been there since late 2000s.

It's more complex than just x86 and x86_64, as that is just a standard interface, what the CPUs do behind the scenes is way more complicated.

[-]

Rd3055@reddit

Interesting post (and thanks for the compiler comparison website. It speaks to my inner nerd).

Now, I understand. It's basically tying up all the loose ends under the hood in terms of uArch design and other things to streamline x86, x86_64 and make it a more coherent platform.

It would be great if they could "trim the fat" (so to speak) and make it even more competitive with ARM on a performance-per-watt ratio.

[-]

doscomputer@reddit

The other industry players (Dell, Google, etc.) are obviously playing off x86 and ARM against each other.

Dell wouldn't be able to sell computers without x86 chips, google doesn't really sell hardware other than phones and chromeOS is literally chip agnostic. I'd bet there are probably more x86 chrome books than ARM ones.

I swear r/hardware is worse than the beyond3d forums at this point

[-]

Rd3055@reddit

Dell still has an interest in increased CPU choice/competition.

Otherwise, they would not have invested in making laptops with Snapdragon Elite chips to give ARM another chance.

Also, way to be a douche canoe.

[-]

Adromedae@reddit

It's just about marketing and visibility.

X86 has been extended nonstop all along. It just doesn't make the rounds.

AMD and Intel want to sort of make the process more visible to gain mindshare.

That's all.

[-]

basil_elton@reddit

But I have a question that perhaps someone here could answer: what exactly are they talking about when it comes to interoperability problems?

First thing that comes to mind - Intel ME and AMD PSP.

Also whatever open standard that AMD is looking to in the future as they replace AGESA. IIRC Intel has not talked about anything like that so far.

Unified software stack across every product category - Intel has OneAPI that brings everything under one umbrella. AMD still splits it up between CPU/GPU/FPGAs. Though this will need to have only the CPU component as common, as the other stuff is not x86 per se.

No more ISA segmentation along the lines of extensions to the basic ISA, like the infamous AVX-512 soup.

[-]

Exist50@reddit

Unified software stack across every product category - Intel has OneAPI that brings everything under one umbrella

Tbh, that's a very thin veneer over multiple backends. It's not really unified in practice. Killing their AI ASICs will probably help consolidate to something more reasonable though.

[-]

AtLeastItsNotCancer@reddit

I imagine they want to broaden their collaboration when it comes to developing future extensions and revisions of the base x86 ISA, for example AVX10, x86s and beyond. Especially when it comes to x86s it'd be great to have everyone on the same page (including the software/OS people), you really don't want each company to do their own thing and make everyone's life harder when it comes to compatibility.

[-]

Exist50@reddit

Especially when it comes to x86s it'd be great to have everyone on the same page (including the software/OS people)

x86s is probably dead now. The main team behind it at Intel were laid off/quit.

[-]

RealPjotr@reddit

At best, yes.

Real world (at least for many years) https://medium.com/codex/fixing-intel-compilers-unfair-cpu-dispatcher-part-1-2-4a4a367c8919

"During compilation, the Intel compiler adds a little bit of extra code that checks the vendor string from CPUID. If the vendor string is “GenuineIntel” (i.e. an Intel processor) then the software uses the optimized code path, with SIMD instructions. If the vendor string is “AuthenticAMD” (i.e. AMD processor) or anything else, then the software runs the unoptimized path."

[-]

R1chterScale@reddit

what exactly are they talking about when it comes to interoperability problems?

In addition to what was mentioned, maybe instruction set support (mainly wrt the clusterfuck that is AVX512/AVX10)

[-]

masterfultechgeek@reddit

I'm going to speculate.

DanLuu has an AWESOME website that touches on things including his time at Centaur.
He mentioned that in designing CPUs one thing they worried about was compatibility.
Intel and AMD both have bugs in their CPUs. One of the debates that was had at Centaur was whether or not to INTENTIONALLY replicate a bug for the sake of compatibility/consistency.

I can imagine both AMD and Intel wanting to squash some bugs and edge cases where things perform a bit differently on one CPU or another.

[-]

theQuandary@reddit

Sounds like ARM and especially RISC-V really have Intel/AMD worried.

[-]

cuttino_mowgli@reddit

Yeah, most of their top customers are developing their in-house ARM or RISC-V chips.

[-]

doscomputer@reddit

yeah for low price mobile products or cloud compute ASICs, not exactly volume parts.

Apple is a thing I'll give you that but even then nobody is buying macs specifically for the ISA, they're buying it for Apples engineered battery life and good performance. The same people who bought intel macs are the ones buying ARM macs. And they'd go back to buying intel macs if apple wanted to 180, its what's known as a captive market.

[-]

PeakBrave8235@reddit

Especially RISCV? Lol? Why especially? RISCV has zero consumer products on the market. Tired of the constant open source blowing simply because it’s open source

[-]

doscomputer@reddit

there are riscv microcontrollers you can get, even Raspberry Pi recently launched a dual ARM/RISCV native chip that can use either ISA.

[-]

theQuandary@reddit

Open Spec != Open Source

The top-end chips will 100% be proprietary.

What do you mean by "consumer"? RISC-V is taking over the MCU market so fast that ARM has already started moving their embedded designers over to HPC in anticipation of that market basically going away. By the raw numbers, there are more RISC-V chips made in the last 1-2 years than x86 chips made in the past decade or so.

If a company uses ARM, they are completely dependent on ARM for anything they want or need and unless they are Apple, they have absolutely no say in how the ISA is designed. On top of this, they must pay ARM very significant royalties (significant enough that Qualcomm decided it was worth possibly getting sued in order to lower their royalty payments).

If you look at the inflection point of ARM and x86, it took around 15 years for ARM to catch up. RISC-V only got all the stuff they needed for desktop chips a couple years ago and it already looks like the inflection point is coming very soon. Lots of big companies are heavily investing with some like Alibaba investing billions into not only the ISA, but the software too. NASA is moving toward using RISC-V. A bunch of EU countries are investing in RISC-V along with others like Brazil, India and China. They want freedom from US/UK control of the chip market.

[-]

PeakBrave8235@reddit

If you want to be actually precise, it’s an open standard. My point remains the same. People blow RISCV because it’s “open.” It’s like this weird phenomenon where people want to admit ARM is the future but don’t want to actually admit they were wrong about x86 being the future or something, so let’s promote this “open” standard because…. It’s not ARM and it’s “open.” I have not seen any tangible benefits to a flagship SoC choosing RISCV rather than ARM.

The entire premise of the comment I replied to was faulty. Intel AMD don’t care “especially” about RISCV. They’re always developing their own ARM chips. I don’t see why they “especially” worried about RISCV when they are ARM chips that are slaughtering theirs right now. That’s my point. Why the hell are they “especially” worried about a ISA that’s being used in microcontrollers, not actual flagship chips.

What do you mean by "consumer"?

By consumer I’m referring to SoCs shipping in actual products that do more than microcontrollers. Cool that RISCV is in some microcontrollers. I’m sure that helps pads profits for low margin OEMs.

If a company uses ARM, they are completely dependent on ARM for anything they want or need and unless they are Apple, they have absolutely no say in how the ISA is designed. On

And? When have OEMs like Qualcomm, Intel, AMD, etc shown a willingness to ditch the past and embrace the future? They’ve been extremely happy to ride on the coattails of Apple, who has pushed the industry forward with ARM ISA. All the forward thinking stuff is worked on by Apple and Arm. It’s why you get 64 bit processing in a mobile phone, for example.

significant enough that Qualcomm decided it was worth possibly getting sued in order to lower their royalty payments

Lol Qualcomm the greedy company that charges insane royalties to every customer and double dips on customer revenues. They’re merely profit seeking. I mean, if they want to lower their costs then sure, but other companies have rightfully shown they do the same.

[-]

theQuandary@reddit

RISC-V arguably isn't a standard because it isn't recognized or approved by any normal standards body. What some generic strawman person asserts has nothing to do with me.

Qualcomm put forward an entire proposal for RISC-V so they could have an easier time porting X Elite to RISC-V because of the money it would save them. The bottom line matters to companies and outside of Apple, nobody wants to pay the kinds of royalty ARM wants (especially since SoftBank ramped up the pressure).

Nvidia almost purchasing ARM was yet another vote of no confidence. No company wants to risk the rug being jerked out from under them and RISC-V offers complete protection from this.

The rise of RISC-V is happening at an insane pace and the long-term threat of hundreds of the world's largest companies and countries far exceeds the threat from ARM.

By consumer I’m referring to SoCs shipping in actual products that do more than microcontrollers. Cool that RISCV is in some microcontrollers. I’m sure that helps pads profits for low margin OEMs.

It takes 4-5 years to launch a new uarch. We are 2 years in from the RVA22 standard. We know for sure that large, wide cores are being worked on and the companies involved have massive funding. Your skeptical position simply has no merit in my estimation and you haven't offered any actual facts to change my mind.

[-]

PeakBrave8235@reddit

RISC-V arguably isn't a standard because it isn't recognized or approved by any normal standards body. What some generic strawman person asserts has nothing to do with me.

RISCV literally calls themselves an open standard.

https://riscv.org/blog/2023/05/risc-v-an-open-standard-instruction-set-architecture/

“In this blog post, we’ll explain why RISC-V is an open standard instruction set architecture (ISA).”

The bottom line matters to companies and outside of Apple, nobody wants to pay the kinds of royalty ARM wants (especially since SoftBank ramped up the pressure).

Yeah that’s my problem. Any talk around RISCV revolves around profits, not technology advances.

Nvidia almost purchasing ARM was yet another vote of no confidence. No company wants to risk the rug being jerked out from under them

And yet no one has bought ARM lol

The rise of RISC-V is happening at an insane pace

Uh okay

Your skeptical position simply has no merit in my estimation and you haven't offered any actual facts to change my mind.

Okay lol

[-]

soggybiscuit93@reddit

Expect, long term, RISC-V to be of great interest to nations like China, Russia, etc. To have an ISA that's not beholden to sanctions or on the "otherside" of the growing tech divide.

[-]

PeakBrave8235@reddit

…and? What does that have to do with Intel/AMD somehow caring more about RISCV than ARM?

[-]

soggybiscuit93@reddit

Because it's going to have nation-state backing. Not a risk today doesn't mean no risk in the future.

[-]

PeakBrave8235@reddit

I’m confused why Intel/AMD are presently concerned “especially” with RISCV still.

[-]

BookinCookie@reddit

They aren’t. RISC-V isn’t going to be a real threat to them for at least another decade. ARM is a real threat right now.

[-]

doscomputer@reddit

sounds like you didn't even read the article

[-]

From-UoM@reddit

I think its GPUs that is making them more worried. Especially with how Nvidia is pushing everything to be GPU accelerated and less relaint on CPUs

[-]

theQuandary@reddit

Tenstorrent's designs are a lot like Larabee done right. Lots of other companies are using RISC-V for their NPU designs too.

RISC-V has some potential to take over the GPU market too. That's an especially exciting prospect for me because if we could settle on a single GPU ISA, then all our layers of GPU abstraction could simply go away and we'd get more portable and faster software.

[-]

Adromedae@reddit

Some of the word salads in this thread are hilarious, thank you.

[-]

jaaval@reddit

I think it also has to do with intel proposing a set of fairly big changes to the ISA and they really need AMD on board on those. There is the x86s proposal, the APX proposal and AVX10 proposal right now in the works. If they want software support to be there they need AMD.

[-]

Exist50@reddit

There is the x86s proposal

That is basically dead with Royal's cancellation.

AVX10 is also a minor iteration on AVX512. Shouldn't be any tension there. The only moderately interesting one left is APX.

[-]

Edenz_@reddit

Can you elaborate anymore on what Royal was going to be? Obv we’ve heard that it was an extremely wide uArch but can you give any details around what the team were aiming for or high level structure sizes?

[-]

TwelveSilverSwords@reddit

https://www.reddit.com/r/intel/comments/1f945fl/some_rumors_about_the_royal_core_project/

[-]

Exist50@reddit

Btw, I should correct myself slightly from my last reply. I misread Raichu's remarks about IPC. He's correct about RYL1 being ~2x, and technically correct about RYL1 being cancelled last year, but the Royal project as a whole was only cancelled in July. And the comparison should have been RYL1.1 or RYL2. Basically, everything /r/BookinCookie says. He/she knows what's up.

[-]

Edenz_@reddit

Yeah I've seen this but I get the feeling that u/Exist50 knows whats BS and whats real.

[-]

Exist50@reddit

Obv we’ve heard that it was an extremely wide uArch but can you give any details around what the team were aiming for or high level structure sizes?

Don't know specific sizes, and they probably would be hard to compare apples to apples even if I did, but yes, was supposed to be extremely large in all regards, with a commensurate increase in IPC. Heard very early targets were set in terms of whole number multiples of Golden Cove, for whatever that's worth.

In this context, Royal64 was also the internal name for x86s. That's why I think it's now dead. No sense cutting out legacy if you're just iterating on a legacy core.

[-]

imaginary_num6er@reddit

Well yeah. Royal Core was supposed to obsolete non-x86 systems in both performance and efficiency “for the foreseeable future” per MLID

[-]

metakepone@reddit

Yes, only for Intel to conveniently cancel it just so MLID can yell "WTF INTEL!?"

All of it checks out.

[-]

TwelveSilverSwords@reddit

MLID isn't the only person who has floated Royal Core rumours. See this:

https://www.reddit.com/r/intel/comments/1f945fl/some_rumors_about_the_royal_core_project/

Oh yeah, MLID's narrative about Royal Core is half made-up nonsense.

[-]

luminaries Linus Torvalds and Tim Sweeney

Maybe Tim Sweeney can focus on finishing EGS first?