Latest ARM CPU cores compared: Performance-Per-Area and Performance-Per-Clock

Posted by TwelveSilverSwords@reddit | hardware | View on Reddit | 47 comments

Core	INT	INT%	FP	FP%	P	Area	Clock	PPA	PPC
A18-P	10.7	120%	16.0	114%	117%	3.1 mm²	4.04 GHz	36.56	28.96
A18-E	3.3	37%	5.0	35%	36%	0.8 mm²	2.2 GHz	45.00	16.36
Oryon-L	8.9	100%	14.0	100%	100%	2.1 mm²	4.32 GHz	47.61	23.14
Oryon-M	5.2	58%	8.0	57%	58%	0.85 mm²	3.53 GHz	68.23	16.43
X925	8.8	99%	13.9	99%	99%	2.8 mm²	3.63 GHz	35.35	27.27
X4	7.4	83%	10.0	71%	77%	1.75 mm²	3.3 GHz	44.0	23.33
A720	3.6	40%	5.7	40%	40%
1.0 mm²	2.4 GHz	40.0	16.66

Notes

A18-P and A18-E as implemented in the Apple A18 Pro.
Oryon-L and Oryon-M as implemented in the Snapdragon 8 Elite.
Cortex X925, Cortex X4 and Cortex A720 as implemented in the Dimensity 9400.
SPEC2017 INT/FP numbers taken from this Geekerwan video.
Core area measured based on dieshots of the 3 SoCs by Kurnal.
Only L1 caches are included to core areas.
All 3 SoCs are manufactured on TSMC's N3E process, so this can be considered an iso-node comparison.
P is obtained by adding INT and FP percentages, and dividing by 2.
PPA = Performance Per Area. This is obtained by dividing P by Area.
PPC = Performance Per Clock. This is obtained by dividing P by clock speed.
I also wanted to do a Performance Per Watt comparison, but decided otherwise. I am a firm believer that power curves are essential to obtain a full idea of the efficiency of a core. You can view the power curves of all the above CPU cores in the Geekerwan video I linked above.

Observations

Let me know if I have made any mistakes in the data or calculations.

[-]

SmashStrider@reddit

Oryon cores have some pretty impressive performance for how big they are. Zen 5 or Lion Cove level performance while being almost 2mm\^2 smaller.

[-]

f3n2x@reddit

SMT in negligible as far as size goes but yes, AVX-512 probably takes up quite a bit indirectly through bandwidth requirements within the core etc.

Either way saying "Zen 5 or Lion Cove level performance" is a hell of a stretch considering lots of optimizations have gone into x86 cores which benefit stuff like gaming but are never measured in these comparisons.

[-]

TwelveSilverSwords@reddit (OP)

Core	Area	SoC	Node
Lion Cove	3.4 mm²	Lunar Lake	N3B
M4-P	3.2 mm²	M4	N3E
Zen5	3.2 mm²	Strix Point	N4P
Cortex X925	2.8 mm²	Dimensity 9400	N3E
Oryon	2.6 mm²	X Elite	N4P
M3-P	2.5 mm²	M3	N3B
Oryon-L	2.1 mm²	8 Elite	N3E
Zen5C	2.1 mm²	Strix Point	N4P
Cortex X4	1.75 mm²	Dimensity 9400	N3E
Skymont	1.1 mm²	Lunar Lake	N3B
M4-E	0.85 mm²	M4	N3E
Oryon-M	0.85 mm²	8 Elite	N3E

Zen5 is fine, but Lion Cove is rather bloated.

[-]

crystalchuck@reddit

Man, Lion Cove really is a stinker

[-]

SmashStrider@reddit

Intel really needs to improve their P-Core. Their own Skymont cores give LC a real run for it's money, getting within striking distance on Lion Cove in INT and FP IPC, while being a third of the size, and consuming way less power. As u/TwelveSilverSwords mentioned, Lion Cove is especially bloated despite being on 3nm and not using SMT or AVX-512, vs Zen 5 being on 4nm and using both SMT and AVX-512, while still having similar or more IPC than Lion Cove does.
To be fair though, the situation was even worse before, with the absolutely massive Cypress Cove cores with Zen 3 level IPC. Golden and Raptor Cove were smaller, but mainly due to higher node density, and still more than twice as big as Zen 4 Cores for slightly higher IPC. Redwood Cove, while a minor improvement in performance, did majorly address the bloated core size of Raptor Cove, and also introducing efficiency improvements. Lion Cove is a further iteration on Redwood Cove with a better node, and definitely makes Intel's P-Core look a lot better compared to the competition to better, but is still inferior. Maybe Cougar and Panther Cove can address this.

[-]

battler624@reddit

where is the data from

[-]

Edenz_@reddit

He says in the post, Kurnal on twitter posts them.

[-]

SherbertExisting3509@reddit

Honestly saying that Lion Cove is bloated is kind of unfair considering that Lion Cove beats Zen-5 in integer performance (while matching the M1) while falling behind in floating point Zen-5 is a similar size to LNC while being weaker than the M1 in integer and floating point performance. It's one of the weakest P core designs on this list.

[-]

6950@reddit

Skymont is the impressive one

[-]

III-V@reddit

SMT in negligible as far as size goes

I remember the discussion on Lion Cove suggested otherwise. It was like a 20%+ area impact.

[-]

Aggressive_Soil_3969@reddit

Yes. This metric will mostly shows if a chip is feature rich or more simple/specialized.

[-]

boredcynicism@reddit

SPECfp2017 can have a little gain from AVX-512, though obviously not as much as with manual vectorization of the code.

[-]

6950@reddit

Yeah but SIMD workload gains are massive if vectorised properly it would be hilarious

[-]

Vollgaser@reddit

Zen5 isnt actually that big without the L2. its about 3,1 mm2 on N4P. Estimating the size on n3e is not acuratly possible but of we just go with tsmc number on the chip density of n3e being 1.3x then zen5 on n3e would be 2.38 mm2. That would be slightly larger then Oryan V2 but also more powerful especially if we consider that on n3e it could probably achieve higher clocks. I dont know about lion coves size though.

[-]

TwelveSilverSwords@reddit (OP)

I dont know about lion coves size though.

See here

[-]

jedijackattack1@reddit

Zen 5 is also on n4 nit n3e only zen 5c is n3e

[-]

Edenz_@reddit

While this is interesting, I feel that these comparisons are dubious when the next level cache (L2 on Apple/QC and L3 for x86) play such a massive role in their performance.

I understand adding the cache area makes the comparison harder but the nuance of knowing that an A18 P-Core can access 16MB of L2 is important for these PPC/PPA comparisons IMO.

The cores don’t operate in a vacuum.

[-]

Vince789@reddit

Agreed, including pL2 but excluding sL2 is very misleading

IMO we need multiple area metrics:

Core only, it's very misleading to include pL2 but exclude sL2
Overall CPU area. Core + L2 + L3 + AMX/SME areas (SLC excluded as its a different SoC block)
Core + sL2/# cores vs Core + pL2 + sL3/# cores?

The first two are fairly objective

The last one is quite arbitrary. Do we do:

sL3/# cores? Gives the big/mid cores an advantage, and disadvantages the little/tiny cores
sL3/# big/mid core? Gives the little/tiny cores an advantage
Maybe a weighting system?

Also for reference, IMO:

Arm's Xxxx Big = Apple/Qualcomm/Intel/AMD's P cores
Arm's X/A7xx Mid = AMD's Zen Compact/Qualcomm's E cores
Arm's A7xx Little = Apple/Qualcomm/AMD's E cores
Arm's A5xx Tiny = Intel's LPE cores

Although it can be argued Qualcomm's E cores are actually mid cores once L2 is accounted for. L2 is also what determines if Arm's Xxxx cores are big vs mid and Arm's A7xx are mid vs little

[-]

TwelveSilverSwords@reddit (OP)

X925 is twice the size of X4. That is terrific. I wonder where X930 will go.

Thanks for pointing out the error. I excluded L2 area for X925, but not X4 and A720. Will edit the table.

[-]

Vince789@reddit

Yea, but I believe the X925 being twice the size of the X4 is mostly due to HP libraries being used instead of HD libraries

From Arm the microarchitecture changes don't seem to be enough to explain the die size doubling

It's similar to how for core only area, Zen5 is about 50% larger than Zen5c (excluding pL2), despite featuring mostly the same microarchitecture

[-]

TwelveSilverSwords@reddit (OP)

Supposedly Oryon-L also uses HP library, so the 2.1 mm² size is impressive.

[-]

Vince789@reddit

Agreed, IMO Oryon-L is more impressive than Oryon-M

Another interesting thing is Oryon seems to perform better in GB vs SPEC, sadly we don't have more benchmarks on Android

Will be interesting to see OryonV3 with more benchmarks on WoA/Linux

[-]

MMyRRedditAAccount@reddit

Only the initial batch of devices seeded to media performed well in gb6 (~3.3k 1T, ~10k nT) Retail devices are much lower (2.9-3k 1T and ~9k nT), and performance drops even lower in Chinese devices if you disguise the geekbench application. You won’t be getting anywhere close to the claimed performance in “normal” apps

[-]

signed7@reddit

Note that while Qualcomm is behind in PPC/IPC, they seem to be able to be clocked higher at similar power usage as others with lower clocks

[-]

Wh1teSnak@reddit

Quick question: Is there anything I could read about the relationship between the clock speed and the power consumption? I always assumed they are linearly related but I guess that is not true looking at recent examples.

[-]

TwelveSilverSwords@reddit (OP)

Power consumption increases exponentially with clock speed.

Frequency ∝ (Power)^n

n is usually a factor of 2 or more.

[-]

calcium@reddit

AFAIK there is a link between the two, but not to the point that you'd otherwise think. A lot has to deal with the architecture of the product so comparing an x86 chip and ARM won't be the same, neither will there be similar comparisons between generations of chips, so say something like Zen3 vs Zen4.

[-]

Balance-@reddit

This is quite cool!

Seems Oryon-M is a beast in PPA, and Oryon-L also is very competative.

Those high densities should allow Qualcomm to bundle more cores in comparable SoCs. Hopefully we will see Oryon soon in the Snapdragon 7s, 7 and 7+ series.

[-]

Famous_Wolverine3203@reddit

Oryon does sacrifice PPW for PPA. Its barely better than 8 gen 3, E cores on 4nm.

[-]

Vince789@reddit

Also Oryon-M's PPA isn't as impressive once you account for the huge sL2

[-]

boredcynicism@reddit

Is Oryon-L based on X925?

[-]

DerpSenpai@reddit

Oryon-L is a ground up design by the team of Nuvia. Same thing as Oryon-M. 100% independent from ARM

[-]

TwelveSilverSwords@reddit (OP)

Just 3 years after the Nuvia acquisition, Qualcomm has already put out 3 cores: Oryon, Oryon-L and Oryon-M.

Impressive?

[-]

Famous_Wolverine3203@reddit

Oryon has in the works since 2020

[-]

TwelveSilverSwords@reddit (OP)

The Phoenix core in X Elite is certainly not identical to the one developed by Nuvia before the acquisition. That's what court filings say.

ARM requested that Qualcomm destroy the Nuvia IP. Qualcomm then sequestered the Nuvia IP, redesigned the Phoenix core to remove the Nuvia IP, and submitted it to ARM.

u/-protonsandneutrons- can correct me if I am mistaken.

[-]

Famous_Wolverine3203@reddit

Its unlikely to be a complete redesign. The server DNA of Oryon is very apparent. They probably iterated on it.

[-]

Raikaru@reddit

Impossible cause it was developed before it was released

[-]

boredcynicism@reddit

That depends on how close Qualcomm is with ARM, surely. Apple started working on 64-bit ARM cores before the 64-bit architecture was publicly defined.

[-]

Raikaru@reddit

Qualcomm used to also make custom cores at the exact same time and they got 64 bit cores by dropping them with the Snapdragon 810.

[-]

boredcynicism@reddit

I don't know the exact state but there may be reason why they had such a serious falling out, and the involvement of Nuvia: https://www.pcworld.com/article/2497912/arm-will-cancel-qualcomms-license-to-make-the-snapdragon-x-elite.html

[-]

TwelveSilverSwords@reddit (OP)

It's a custom core designed entirely in-house by Qualcomm.

[-]

xCAI501@reddit

Qualcomm's Oryon cores have outstanding PPA. Oryon-M has better PPA than A18-E and Cortex A720.

The PPC of Cortex A720, A18-E and Oryon-M is almost identical. The much higher performance of Oryon-M is purely due to it's higher clock speed.

The same is true for Oryon-M's higher PPA, and for the same reason when compared to A18-E which has nearly equal area. I wonder how high an A18-E could clock if Apple pushed it.

[-]

TwelveSilverSwords@reddit (OP)

The Apple E-cores in M chips tend to be clocked higher. The E-core in M4 can run upto 2.9 GHz.

[-]

Noble00_@reddit

Nice! Just what I was looking for from your other discussion. I mentioned how Oryon-M was just as competitive with other efficiency cores but didn't know the size. Seems like Oryon-M is class leading with PPA, really impressed.

[-]

VenditatioDelendaEst@reddit

That said, it is more of a PPA core than an efficiency core.

https://i.imgur.com/1NUTOH3.png

[-]

MiniRusty01@reddit

Me looking at all this not understanding a single thing 👁️👄👁️