Total energy used to complete one thread is therefore over 23 J when run on P cores, and less than 1.7 J when run on E cores. E cores therefore use only 7% of the energy that P cores do performing the same task.
… P cores at maximum frequency take 9.2-9.7 seconds per thread, and use about 2,500 mW per thread. E cores running low QoS threads at close to minimum frequency take about four times as long, 38.5 seconds…
depends on what you’re optimising for, if you have time sure you can wait 4x as long. applicable to almost all CPUs, energy consumption just isn’t really a factor for most people if they can get something done quicker.
Actually it is the other way around. 7 % of energy use over 4x time means the active power consumption during the task is 7% divided by 4 or only around 2 %. Energy is power multiplied by time.
As an electronic engineer, I should be ashamed I got that wrong but it's been a while :)
It's unbelievable how good Apple's E-cores have gotten. They are the sole reason why they were able to cram the M4 inside a 5.1mm think fanless IPad.
Qualcomm's Oryon cores probably give you 80-90% of the performance found in the M3 gen P-cores. But they have no answer for these E-cores. Heck, I'd bet they don't even have an answer to the M1 gen E-cores. That's probably why we still haven't seen a fanless Windows competitor to the Macbook Air.
I think where the E cores come in handy are with tasks that need to be completed, but aren’t as time sensitive.
So for example, in an app that loads a bunch of data from the internet, there’s stuff that the user will probably want to see but won’t be looking at immediately. Devs can take advantage of this and offload decoding and processing of that “later” data to the E cores, with P cores only handling data that’s needed immediately.
The benefit is twofold: first, the P cores burn through their work more quickly because they don’t have as much queued up, and two the P cores get to sleep more since that “later” data is already decoded when it’s needed. Both allow the app to feel more snappy and to same more energy than if P cores had been taking care of everything.
Power consumption increases with frequency to the third power whereas performance is linear. Therefore energy to perform a task increases by frequency squared.
Power and performance are never linear, always exponential. We're just focusing on some mid point in each market segment. Voltage and clock variation will only cover a certain span in that curve. P and E cores allows better spread within one piece of silicon, but costs transistors.
this is similar to how it's easier to do a small amount of work over a long period of time instead of a lot of work over a short period of time
think about the energy it takes to run 100m as fast as you can vs walking that same distance. you get there in both cases, but one uses a TON more effort and energy
it's a similar concept, there's efficiency lost by getting it done as quickly as possible, but sometimes you just can't afford to wait for the slower completion
Boost modes aren't efficient. We know this. They're tuned for responsiveness rather than power.
It's using on-chip power estimation, which can be extremely inaccurate in many cases, doubly so at lower current draws.
This also isn't actually the power used to complete the task, as you still need the SoC fabric, memory controller, memory chips themselves etc. all running to actually do anything. So there'll be an additional (not quite but often close enough to) constant power, so the "total processing system" power delta would likely be much smaller. And that assumes the values used are actually comparable, often shared units are "assigned" to a counter to avoid double-counted things like cache or fabric power.
Yup. It is one of the dangers of people using blog posts as reference, since most of these conclusions/insights haven't been peer reviewed.
Some people underestimate how hard it is to get power data from within the SoC itself, in fact even the package itself is very hard to get proper power data on.
The best we can do now in terms of intra chip power estimation is read current from the rail. But that feeds multiple systems, so we can't really isolate a single IP. It also requires to have a lot of information about the SoC itself, which is usually proprietary and must be done collaborating with the vendor.
TwelveSilverSwords@reddit (OP)
How? How is the difference so vast?
RealPjotr@reddit
They do it a lot slower.
Famous_Wolverine3203@reddit
This is energy, not power so even if they are doing it slow, the power consumption is so minimal it justifies it.
elephantnut@reddit
depends on what you’re optimising for, if you have time sure you can wait 4x as long. applicable to almost all CPUs, energy consumption just isn’t really a factor for most people if they can get something done quicker.
tecphile@reddit
If they only use 7% energy but take 4x as long, that still means they only use 25-30% of the power of the P-cores.
As someone who's been using, since launch day, an M1 Air exclusively on low-power mode, this is very good news.
It seems that Apple has focused on making the P-cores more performant and the E-cores more efficient with each generation.
melberi@reddit
Actually it is the other way around. 7 % of energy use over 4x time means the active power consumption during the task is 7% divided by 4 or only around 2 %. Energy is power multiplied by time.
tecphile@reddit
As an electronic engineer, I should be ashamed I got that wrong but it's been a while :)
It's unbelievable how good Apple's E-cores have gotten. They are the sole reason why they were able to cram the M4 inside a 5.1mm think fanless IPad.
Qualcomm's Oryon cores probably give you 80-90% of the performance found in the M3 gen P-cores. But they have no answer for these E-cores. Heck, I'd bet they don't even have an answer to the M1 gen E-cores. That's probably why we still haven't seen a fanless Windows competitor to the Macbook Air.
CarbonatedPancakes@reddit
I think where the E cores come in handy are with tasks that need to be completed, but aren’t as time sensitive.
So for example, in an app that loads a bunch of data from the internet, there’s stuff that the user will probably want to see but won’t be looking at immediately. Devs can take advantage of this and offload decoding and processing of that “later” data to the E cores, with P cores only handling data that’s needed immediately.
The benefit is twofold: first, the P cores burn through their work more quickly because they don’t have as much queued up, and two the P cores get to sleep more since that “later” data is already decoded when it’s needed. Both allow the app to feel more snappy and to same more energy than if P cores had been taking care of everything.
mdedetrich@reddit
The best example of this is spotlight indexing which runs purely on e-cores
Miserable_Fault4973@reddit
Power consumption increases with frequency to the third power whereas performance is linear. Therefore energy to perform a task increases by frequency squared.
RealPjotr@reddit
Not if you're in a hurry. 🤷
Power and performance are never linear, always exponential. We're just focusing on some mid point in each market segment. Voltage and clock variation will only cover a certain span in that curve. P and E cores allows better spread within one piece of silicon, but costs transistors.
Famous_Wolverine3203@reddit
More than aware. Just saying thats what E cores are for. They do tasks slower but efficiency advantages are worth it.
noneabove1182@reddit
for others:
this is similar to how it's easier to do a small amount of work over a long period of time instead of a lot of work over a short period of time
think about the energy it takes to run 100m as fast as you can vs walking that same distance. you get there in both cases, but one uses a TON more effort and energy
it's a similar concept, there's efficiency lost by getting it done as quickly as possible, but sometimes you just can't afford to wait for the slower completion
Adromedae@reddit
Because their approach to estimate power is extremely flawed.
TwelveSilverSwords@reddit (OP)
This is the answer
Jonny_H@reddit
Yup - a few things obviously stand out:
Boost modes aren't efficient. We know this. They're tuned for responsiveness rather than power.
It's using on-chip power estimation, which can be extremely inaccurate in many cases, doubly so at lower current draws.
This also isn't actually the power used to complete the task, as you still need the SoC fabric, memory controller, memory chips themselves etc. all running to actually do anything. So there'll be an additional (not quite but often close enough to) constant power, so the "total processing system" power delta would likely be much smaller. And that assumes the values used are actually comparable, often shared units are "assigned" to a counter to avoid double-counted things like cache or fabric power.
Adromedae@reddit
Yup. It is one of the dangers of people using blog posts as reference, since most of these conclusions/insights haven't been peer reviewed.
Some people underestimate how hard it is to get power data from within the SoC itself, in fact even the package itself is very hard to get proper power data on.
The best we can do now in terms of intra chip power estimation is read current from the rail. But that feeds multiple systems, so we can't really isolate a single IP. It also requires to have a lot of information about the SoC itself, which is usually proprietary and must be done collaborating with the vendor.
Just_Maintenance@reddit
P cores at max performance vs e cores at min performance