Latest ARM CPU cores compared: Performance-Per-Area and Performance-Per-Clock
Posted by TwelveSilverSwords@reddit | hardware | View on Reddit | 47 comments
Core | INT | INT% | FP | FP% | P | Area | Clock | PPA | PPC |
---|---|---|---|---|---|---|---|---|---|
A18-P | 10.7 | 120% | 16.0 | 114% | 117% | 3.1 mm² | 4.04 GHz | 36.56 | 28.96 |
A18-E | 3.3 | 37% | 5.0 | 35% | 36% | 0.8 mm² | 2.2 GHz | 45.00 | 16.36 |
Oryon-L | 8.9 | 100% | 14.0 | 100% | 100% | 2.1 mm² | 4.32 GHz | 47.61 | 23.14 |
Oryon-M | 5.2 | 58% | 8.0 | 57% | 58% | 0.85 mm² | 3.53 GHz | 68.23 | 16.43 |
X925 | 8.8 | 99% | 13.9 | 99% | 99% | 2.8 mm² | 3.63 GHz | 35.35 | 27.27 |
X4 | 7.4 | 83% | 10.0 | 71% | 77% | 1.75 mm² | 3.3 GHz | 44.0 | 23.33 |
A720 | 3.6 | 40% | 5.7 | 40% | 40% | ||||
1.0 mm² | 2.4 GHz | 40.0 | 16.66 |
Notes
- A18-P and A18-E as implemented in the Apple A18 Pro.
- Oryon-L and Oryon-M as implemented in the Snapdragon 8 Elite.
- Cortex X925, Cortex X4 and Cortex A720 as implemented in the Dimensity 9400.
- SPEC2017 INT/FP numbers taken from this Geekerwan video.
- Core area measured based on dieshots of the 3 SoCs by Kurnal.
- Only L1 caches are included to core areas.
- All 3 SoCs are manufactured on TSMC's N3E process, so this can be considered an iso-node comparison.
- P is obtained by adding INT and FP percentages, and dividing by 2.
- PPA = Performance Per Area. This is obtained by dividing P by Area.
- PPC = Performance Per Clock. This is obtained by dividing P by clock speed.
- I also wanted to do a Performance Per Watt comparison, but decided otherwise. I am a firm believer that power curves are essential to obtain a full idea of the efficiency of a core. You can view the power curves of all the above CPU cores in the Geekerwan video I linked above.
Observations
Let me know if I have made any mistakes in the data or calculations.
SmashStrider@reddit
Oryon cores have some pretty impressive performance for how big they are. Zen 5 or Lion Cove level performance while being almost 2mm\^2 smaller.
6950@reddit
Zen5 has AVX-512 and SMT taking area
f3n2x@reddit
SMT in negligible as far as size goes but yes, AVX-512 probably takes up quite a bit indirectly through bandwidth requirements within the core etc.
Either way saying "Zen 5 or Lion Cove level performance" is a hell of a stretch considering lots of optimizations have gone into x86 cores which benefit stuff like gaming but are never measured in these comparisons.
TwelveSilverSwords@reddit (OP)
Zen5 is fine, but Lion Cove is rather bloated.
crystalchuck@reddit
Man, Lion Cove really is a stinker
SmashStrider@reddit
Intel really needs to improve their P-Core. Their own Skymont cores give LC a real run for it's money, getting within striking distance on Lion Cove in INT and FP IPC, while being a third of the size, and consuming way less power. As u/TwelveSilverSwords mentioned, Lion Cove is especially bloated despite being on 3nm and not using SMT or AVX-512, vs Zen 5 being on 4nm and using both SMT and AVX-512, while still having similar or more IPC than Lion Cove does.
To be fair though, the situation was even worse before, with the absolutely massive Cypress Cove cores with Zen 3 level IPC. Golden and Raptor Cove were smaller, but mainly due to higher node density, and still more than twice as big as Zen 4 Cores for slightly higher IPC. Redwood Cove, while a minor improvement in performance, did majorly address the bloated core size of Raptor Cove, and also introducing efficiency improvements. Lion Cove is a further iteration on Redwood Cove with a better node, and definitely makes Intel's P-Core look a lot better compared to the competition to better, but is still inferior. Maybe Cougar and Panther Cove can address this.
battler624@reddit
where is the data from
Edenz_@reddit
He says in the post, Kurnal on twitter posts them.
SherbertExisting3509@reddit
Honestly saying that Lion Cove is bloated is kind of unfair considering that Lion Cove beats Zen-5 in integer performance (while matching the M1) while falling behind in floating point Zen-5 is a similar size to LNC while being weaker than the M1 in integer and floating point performance. It's one of the weakest P core designs on this list.
6950@reddit
Skymont is the impressive one
III-V@reddit
I remember the discussion on Lion Cove suggested otherwise. It was like a 20%+ area impact.
Aggressive_Soil_3969@reddit
Yes. This metric will mostly shows if a chip is feature rich or more simple/specialized.
boredcynicism@reddit
SPECfp2017 can have a little gain from AVX-512, though obviously not as much as with manual vectorization of the code.
6950@reddit
Yeah but SIMD workload gains are massive if vectorised properly it would be hilarious
Vollgaser@reddit
Zen5 isnt actually that big without the L2. its about 3,1 mm2 on N4P. Estimating the size on n3e is not acuratly possible but of we just go with tsmc number on the chip density of n3e being 1.3x then zen5 on n3e would be 2.38 mm2. That would be slightly larger then Oryan V2 but also more powerful especially if we consider that on n3e it could probably achieve higher clocks. I dont know about lion coves size though.
TwelveSilverSwords@reddit (OP)
See here
jedijackattack1@reddit
Zen 5 is also on n4 nit n3e only zen 5c is n3e
Edenz_@reddit
While this is interesting, I feel that these comparisons are dubious when the next level cache (L2 on Apple/QC and L3 for x86) play such a massive role in their performance.
I understand adding the cache area makes the comparison harder but the nuance of knowing that an A18 P-Core can access 16MB of L2 is important for these PPC/PPA comparisons IMO.
The cores don’t operate in a vacuum.
Vince789@reddit
Agreed, including pL2 but excluding sL2 is very misleading
IMO we need multiple area metrics:
The first two are fairly objective
The last one is quite arbitrary. Do we do:
Also for reference, IMO:
Although it can be argued Qualcomm's E cores are actually mid cores once L2 is accounted for. L2 is also what determines if Arm's Xxxx cores are big vs mid and Arm's A7xx are mid vs little
TwelveSilverSwords@reddit (OP)
X925 is twice the size of X4. That is terrific. I wonder where X930 will go.
Thanks for pointing out the error. I excluded L2 area for X925, but not X4 and A720. Will edit the table.
Vince789@reddit
Yea, but I believe the X925 being twice the size of the X4 is mostly due to HP libraries being used instead of HD libraries
From Arm the microarchitecture changes don't seem to be enough to explain the die size doubling
It's similar to how for core only area, Zen5 is about 50% larger than Zen5c (excluding pL2), despite featuring mostly the same microarchitecture
TwelveSilverSwords@reddit (OP)
Supposedly Oryon-L also uses HP library, so the 2.1 mm² size is impressive.
Vince789@reddit
Agreed, IMO Oryon-L is more impressive than Oryon-M
Another interesting thing is Oryon seems to perform better in GB vs SPEC, sadly we don't have more benchmarks on Android
Will be interesting to see OryonV3 with more benchmarks on WoA/Linux
MMyRRedditAAccount@reddit
Only the initial batch of devices seeded to media performed well in gb6 (~3.3k 1T, ~10k nT) Retail devices are much lower (2.9-3k 1T and ~9k nT), and performance drops even lower in Chinese devices if you disguise the geekbench application. You won’t be getting anywhere close to the claimed performance in “normal” apps
signed7@reddit
Note that while Qualcomm is behind in PPC/IPC, they seem to be able to be clocked higher at similar power usage as others with lower clocks
Wh1teSnak@reddit
Quick question: Is there anything I could read about the relationship between the clock speed and the power consumption? I always assumed they are linearly related but I guess that is not true looking at recent examples.
TwelveSilverSwords@reddit (OP)
Power consumption increases exponentially with clock speed.
Frequency ∝ (Power)^n
n is usually a factor of 2 or more.
calcium@reddit
AFAIK there is a link between the two, but not to the point that you'd otherwise think. A lot has to deal with the architecture of the product so comparing an x86 chip and ARM won't be the same, neither will there be similar comparisons between generations of chips, so say something like Zen3 vs Zen4.
Balance-@reddit
This is quite cool!
Seems Oryon-M is a beast in PPA, and Oryon-L also is very competative.
Those high densities should allow Qualcomm to bundle more cores in comparable SoCs. Hopefully we will see Oryon soon in the Snapdragon 7s, 7 and 7+ series.
Famous_Wolverine3203@reddit
Oryon does sacrifice PPW for PPA. Its barely better than 8 gen 3, E cores on 4nm.
Vince789@reddit
Also Oryon-M's PPA isn't as impressive once you account for the huge sL2
boredcynicism@reddit
Is Oryon-L based on X925?
DerpSenpai@reddit
Oryon-L is a ground up design by the team of Nuvia. Same thing as Oryon-M. 100% independent from ARM
TwelveSilverSwords@reddit (OP)
Just 3 years after the Nuvia acquisition, Qualcomm has already put out 3 cores: Oryon, Oryon-L and Oryon-M.
Impressive?
Famous_Wolverine3203@reddit
Oryon has in the works since 2020
TwelveSilverSwords@reddit (OP)
The Phoenix core in X Elite is certainly not identical to the one developed by Nuvia before the acquisition. That's what court filings say.
ARM requested that Qualcomm destroy the Nuvia IP. Qualcomm then sequestered the Nuvia IP, redesigned the Phoenix core to remove the Nuvia IP, and submitted it to ARM.
u/-protonsandneutrons- can correct me if I am mistaken.
Famous_Wolverine3203@reddit
Its unlikely to be a complete redesign. The server DNA of Oryon is very apparent. They probably iterated on it.
Raikaru@reddit
Impossible cause it was developed before it was released
boredcynicism@reddit
That depends on how close Qualcomm is with ARM, surely. Apple started working on 64-bit ARM cores before the 64-bit architecture was publicly defined.
Raikaru@reddit
Qualcomm used to also make custom cores at the exact same time and they got 64 bit cores by dropping them with the Snapdragon 810.
boredcynicism@reddit
I don't know the exact state but there may be reason why they had such a serious falling out, and the involvement of Nuvia: https://www.pcworld.com/article/2497912/arm-will-cancel-qualcomms-license-to-make-the-snapdragon-x-elite.html
TwelveSilverSwords@reddit (OP)
It's a custom core designed entirely in-house by Qualcomm.
xCAI501@reddit
The same is true for Oryon-M's higher PPA, and for the same reason when compared to A18-E which has nearly equal area. I wonder how high an A18-E could clock if Apple pushed it.
TwelveSilverSwords@reddit (OP)
The Apple E-cores in M chips tend to be clocked higher. The E-core in M4 can run upto 2.9 GHz.
Noble00_@reddit
Nice! Just what I was looking for from your other discussion. I mentioned how Oryon-M was just as competitive with other efficiency cores but didn't know the size. Seems like Oryon-M is class leading with PPA, really impressed.
VenditatioDelendaEst@reddit
That said, it is more of a PPA core than an efficiency core.
https://i.imgur.com/1NUTOH3.png
MiniRusty01@reddit
Me looking at all this not understanding a single thing 👁️👄👁️