M4-powered MacBook Pro flexes in Cinebench by crushing the Core Ultra 9 288V and Ryzen AI 9 HX 370
Posted by KingDragonOfficiall@reddit | hardware | View on Reddit | 316 comments
KingDragonOfficiall@reddit (OP)
"The 14-inch MacBook Pro managed a single-core score of 174, and a multi-core score of a whopping 971. These results are astonishing to say the least, considering the single-core improvement over the M3 sits at a decent 20%, while the leap in the multi-core department is an astronomical 37%. Even the higher-end M3 Pro trails behind the M4 by almost 8%. For a generational upgrade, these numbers are extremely promising."
"When compared to its competitors, the M4 CPU in the 14-inch MacBook Pro appears to be at an undisputable advantage. When compared with Intel's Core Ultra 9 288V 'Lunar Lake', the M4 comes out a whopping 42% ahead in single-core, and 62% ahead in multi-core performance. AMD's Ryzen 9 AI HX 370, despite being more of a competitor for the M4 Pro, falls behind the M4 by a whopping 33% in single-core performance, while being almost neck and neck in multi-core despite its much more generous power envelope."
obicankenobi@reddit
These are insane numbers
chapstickbomber@reddit
Crazy what you can do when you own the entire hardware stack and OS and build fat chips on the freshest node. Wow.
newhereok@reddit
And billions to throw around.
coylter@reddit
I am so excited to finally be able to move away from the absolute piece of crap that is windows. Bring on mac gaming and put that god forsaken OS to rest.
i5-2520M@reddit
Windows is the only platform that cares about longterm compatibility and it is the biggest strength and weakness it has.
There are no other platforms currently where 20 year old games and programs just natively work without an emulation layer or other trickery.
peakdecline@reddit
I'm on the other end.... I'd love this hardware to not be encumbered by Apple software which I find very frustrating to use. And the projects to run Linux on it are nowhere close to daily usability.
MeteorOnMars@reddit
Agreed. The M chips would be wonderful to be unleashed outside the Apple ecosystem. An M handheld gaming system would make me very happy.
WestcoastWelker@reddit
I wonder how well WoW will run, since it’s native on Mac these days.
CarbonatedPancakes@reddit
WoW has had a Mac native build since day one back in 2004. I first played it on an iMac G5, ran great.
WestcoastWelker@reddit
Indeed. But when I tried the m2 pro it was juuust shy of enjoyable for me in terms of wow performance. Hoping this gen might break that barrier.
Zokorpt@reddit
Runs pretty well in an M2 Ultra, but to play with 5k it needs a bit of help playing with everything in max. Had to use AMD FSR. At the time i compared with a 3090 At 2560x1440. But resident evil push’s it hard with 5k. Pitty it doesn’t have league of legends, cyberpunk and diablo natively othrwise wouldn’t had a need for a gaming pc.
Famous_Wolverine3203@reddit
M4 isn’t a fat chip by any standard. Mediatek is building chips for phones with more transistors than the M4.
RegularCircumstances@reddit
Transistors yes but with modems onboard and still less silicon area.
This is the wrong way to make this point since the modem is extra value independently. I would just point out that Lunar Lake is on N3B at 139mm^2, and N3E relaxes density a bit vs N3B and costs less, so a 160-165mm^2 (I forget but around it) N3E die with much more performance sounds exactly right and isn’t that crazy. It is probably similar to the M3 cost wise I bet, but both a step up from the M2 and M1.
Famous_Wolverine3203@reddit
Modems are 15-20mm2 of silicon area. Even removing that, Mediatek is using almost as much transistors as the M4.
Successful_Bowler728@reddit
Entire hardware? Apple doesnt make ram nand screen neither modem.
auradragon1@reddit
They do own the entire stack, but they don't build the fattest cores.
0xd00d@reddit
M4 certainly shaping up like M1 in terms of being an upgrade target. I'm putting my hands together for that tiny m4 Mac mini. That's gonna be a no brainer of a computer
Impressive-Level-276@reddit
I really can't believe that apple crushed two companies that have made CPUs from 50 years .....
AZ_Crush@reddit
Apple's chip design teams are full of ex-employees from those companies.
Impressive-Level-276@reddit
But it's still insane
Secure-Alpha9953@reddit
Is it tho? I would hope a trillion dollar company could throw money at that problem.
not really all that insane, bud
commo64dor@reddit
Ahm PowerPC Ahm
Sie sind zum ersten Mal in diese Welt reingesprungen
East-Love-8031@reddit
Is this huge improvement just because M4 has SME/SVE units that weren't there before ARMv9? Isn't Cinebench is basically an SIMD benchmark that previously favoured Intel/AMD because they have AVX and Apple Silicon was only keeping up because it's so fast everywhere else?
I was expecting Cinebench to have to be recompiled to take advantage of the new instructions. Does anyone know if Cinebench exploits M4 SVE in the current version?
So many questions...
Adromedae@reddit
Cinebench is still using the NEON on the Apple Silicon. So it does use SIMD.
The main problem with x86 is that AMD and Intel are still unable to make proper fat cores, because their cultures are still focused on area optimizations that make not that much sense any more.
StarbeamII@reddit
Aren't Intel P cores massive?
Adromedae@reddit
Not in terms of out-of-order resources like Apple's in the M-series.
theQuandary@reddit
Looking at P-cores and excluding the big caches, the situation shows the exact opposite.
As an interesting point, look at how many cores you could fit in the space of 8 of Meteor Lake cores.
https://www.reddit.com/r/hardware/comments/1fuuucj/lunar_lake_die_shot/
Adromedae@reddit
Out-of-order resources refers to microarchitectural details, such as the size of the register file(s), the ROB, the predictor structures, the widths of fetch and issue, etc.
Not guessed areas.
theQuandary@reddit
Those "guesses" are accurate within a percent or so.
Intel/AMD aren't refusing to make wider cores. As this shows, they simply cannot add more resources without even further exploding a bloated the core size.
This of course begs the question: If ISA doesn't matter, why does x86 need so much more space for so many fewer resources?
Adromedae@reddit
No. They are most definitively not.
Here are 3 things this sub needs to start accepting you guys don't know with any certainty for a modern SoC:
Yield and variability data
The size of actual structures within the die
Power consumption within the die and for the package
Those are all rather proprietary information, which nobody is going to risk their job leaking.
A lot of the discussion in this sub almost invariably end up being akin to that story of a bunch of blind men trying to describe an elephant.
dahauns@reddit
Cinebench 2024 in general uses very little vectorization (and practically all of it is 128bit):
https://chipsandcheese.com/p/cinebench-2024-reviewing-the-benchmark
CalmSpinach2140@reddit
SME and SVE isn't used/supported in Cinebench 2024.
East-Love-8031@reddit
Sounds like M4 is smashing the competition in the benchmark even with a huge handicap. The moment that Maxon compile it to take advantage of the new instructions, it's all over.
TwelveSilverSwords@reddit
Well, Geekbench 6 does use SVE and SME...
Klinky1984@reddit
It's pretty impressive Apple can now truthfully make claims about being faster than PCs. 33% single threaded performance advantage is pretty insane.
AnuroopRohini@reddit
remember they are comparing M4 with laptops chips not PC, in desktop M4 will be destroyed by Intel and AMD
renaissance_man__@reddit
The m4 ipad pro soundly beats the current desktop chip offerings from both Intel and AMD in single-core.
Klinky1984@reddit
Is that because they're objectively faster or because they have more silicon? What about a desktop configured M4 Ultra, if Apple decides to give love to the desktop.
basil_elton@reddit
How is Lunar Lake, and that too the 288V, a competitor to the M4? In a Macbook Pro?
The sole reason why the 288V exists is to provide higher sustained iGPU performance for gaming-centric devices because that is the only thing you would notice from increased PL1, at this power envelope.
NeroClaudius199907@reddit
Gaming is not a thing thing on macs, besides ll competing with m4 airs due to price.
mmkzero0@reddit
People really need to stop propagating this outdated notion; it’s long no longer true thanks to Wine, the people over at CodeWeavers, Apple adding AVX support to Rosetta2, and Frontends like Crossover or Whisky:
Whisky Crossover
MacOS even has their own variant of ESync/FSync called MSync: msync
Aside from dxvk (for DirectX 8 - 11 via Vulkan by ways of moltenVK) and d3dmetal (for DirectX 12) enabling the graphics translation layer, there is even a DirectX 11 to Metal layer actively being worked on: dxmt
Just a few games running so you can see that I’m not just talking out of my ass: Cyberpunk 2077 on M3 Pro FF7 Remake God of War
Mind that these videos are a bit older, before GPTK2 and AVX support which many games benefitted from.
i5-2520M@reddit
This is like Linux users acting like gaming is finally good on linux, but even less true.
chapstickbomber@reddit
Luckily, normies can do these things reliably without error
theQuandary@reddit
If it can be fully explained and walked through in a 3:24 video, it's not too hard for "normies".
https://www.youtube.com/watch?v=xowJ3b-pt-U
ConsciousData685@reddit
lol
rejoicerebuild@reddit
Games running poorly is probably not a good showcase, though it definitely is cool to see them running at all.
rejoicerebuild@reddit
50fps at 1440p with optimized settings is not good performance..
Exist50@reddit
Lmao, no. That's not why it exists. Lunar Lake literally exists to compete with this exact line of chips from Apple. There's no better comparison to make. Doubly so since the entry MBP is much more like the devices we actually see LNL in.
basil_elton@reddit
I said the sole reason why that particular bin exists, not why LNL exists as a whole.
It should be clear to anyone by now that the reason TSMC nodes are superior are due to their performance-power curve. Which is flatter over a larger operational window and has a much less steep fall-off at low power.
Compare an apple silicon chip on N3B to Lunar Lake at the same power and then we'll see the efficiency advantage decrease significantly.
Exist50@reddit
What bin? LNL has very few SKUs to begin with, and they don't differ all that much.
What? Apple's using N3E, which by TSMC's numbers, at least, should be very similar to N3B.
Famous_Wolverine3203@reddit
TSMC kinda lied? N3B in power characteristics seems very similar to N4P. It even seems slightly worse at lower voltages.
N3E also has a 10% advantage in performance over N3B which makes it the actual fixed version of N3B.
Skip to 6:40
https://youtu.be/QK_t1LfEmBA?feature=shared
A18 pro and A17 pro share the same E core architecture, yet N3E offers a 10% boost to performance at the same power compared to N3B.
RegularCircumstances@reddit
RE: lower voltages - you’re thinking about the GPU here when you say that right? Because the GPU seemingly suffered on the A17 Pro vs the A16 but had a new architecture or whatever
Exist50@reddit
That does assume no design optimizations.
Famous_Wolverine3203@reddit
Design optimisations in a span of 6 months resulting in nearly a node’s worth of improvement?
Could be. But the original N3B was lacklustre compared to N4P used by the A16. And the general improvements across the board on the A18 pro (namely the GPU which shares the same architecture as the A17 pro), point toward N3E improvements rather than design ones.
Exist50@reddit
At least by TSMC's numbers, N3E would be 3-8% vs N3B. So leaving a couple percent gap. For a year of design optimization, that is absolutely achievable. Now, what that breakdown actually looks like, we'll probably never know.
Famous_Wolverine3203@reddit
For a year sure, but M4 on N3E came out 6 months later sharing the same fundamental design.
Them not giving numbers for N3B is exactly the reason why I think the improvements are from N3E than design. They probably knew N3B wasn’t that good of an improvement.
TSMC’s figures were 3% more frequency at iso power or 8% lesser power at iso frequency. This 12% more frequency at iso power which is outperforming their figures by nearly 3x.
Maybe we’re both right and its from a combination of both design optimisations AND node improvements.
basil_elton@reddit
The 288V SKU that has PL1 of 30 watts when everything down the stack has PL1 of 17 watts.
That additional 13 watts is to provide more power to the GPU during combined loads, that is gaming.
Where are the Cinebench 2024 MT score performance power curves?
Exist50@reddit
It's configurable in all cases. And most laptops seem to be pushing the upper end of the range.
Then by that logic, this should be a favorable performance comparison for LNL.
...We're talking about node comparisons.
basil_elton@reddit
That's not the reason for its existence. If it was just about configuring cTDP up and down as the OEM pleases, then there would be no need for the 288V to exist officially.
Cinebench 2024 doesn't benchmark both CPU and GPU at the same time using the same workload.
Yes, we are talking iso-node. So where are the graphs for Cinebench 2024 MT scores at different power levels?
Exist50@reddit
It exists so Intel can advertise certain "default" performance levels, mostly. You think the OEMs care about the listed TDP?
Yes, and? You were the one arguing the scores aren't comparable in a CPU workload...
There is no identical core made on the two nodes to compare. But you're the one saying TSMC is basically lying, so why not provide the proof yourself?
basil_elton@reddit
If the OEM does not touch cTDP, then Cinebench 2024 would give different scores with the 288V and every other SKU down the stack if they were put in the same chassis. That is the entire point.
I said that 30 W PL1 for the 288V would show itself when playing games.
I said no such thing. I said TSMC nodes are superior because of their flatter performance power curve for a larger operational window.
What I'm asking for is Cinebench 2024 scores at different power levels. I'll ask again - do they exist for both Lunar Lake and Apple M-whatever?
Because this thread is about Cinebench 2024 CPU scores.
Exist50@reddit
Which they do...
So why are you trying to invalidate the comparison with Apple?
...you do realize both are using TSMC nodes, right?
basil_elton@reddit
Again, not the point. For example, in the comparison Notebookcheck makes using their own data from the Asus Zenbook S14, there is virtually no difference between the 288V and 256V when running Cinebench. That can only mean that it is how Asus configures for that model, or that it is throttling, or a bit of both. There's the additional factor to consider that is Intel's DTT, but it's irrelevant in this context.
I'm saying that the extra watts available to the 288V during sustained load has a different purpose than boosting Cinebench 2024 MT scores.
I'm not invalidating the comparison. I'm saying that the comparison is hard to contextualize because other important data - mainly power consumption - is missing. Moreover some people have argued that the M4 in MBP will also go into the fanless MBA, so we can contextualize its performance and power dissipation from the iPad Pro data. But then, you would have to run Cinebench 2024 on iPad OS. Which is impossible AFAIK.
Yes. That is why I'm asking for the perf-power curves for both the M-chips and Lunar Lake running Cinebench 2024.
996forever@reddit
Until there are fanless systems using lunar lake, MacBook Air isn’t a proper competitor. The launch halo device, Asus Zenbook S14, has a PL1 of 28w on performance profile, which is higher than the M4, and the peak power of 40w is much higher than the M4’s peak.
https://www.ultrabookreview.com/69717-asus-zenbook-s14-oled-review/
Comparison of the chips using the 14” MacBook Pro is therefore perfectly valid.
basil_elton@reddit
Brother, the M4 only exists in the iPad Pro so far. How is a tablet a competitor to a notebook?
And how do you know for sure that when the M4 arrives in a Macbook, it would not also have higher power limits than what it currently has in the iPad?
996forever@reddit
In any case, the single core score isn't affected by the power limits.
basil_elton@reddit
Yes, it is in Cinebench. One can demonstrate it in action with any laptop CPU.
CalmSpinach2140@reddit
The single core in M chip Macs isn’t really affected by the power limits in Cinebench. It uses 10-11 watts in a Single thread R23 run, which the passive MacBook Air is capable of sustaining. https://www.computerbase.de/2024-03/apple-macbook-air-m3-test/#abschnitt_apple_m3_wird_passiv_gekuehlt
basil_elton@reddit
How do you know that the 5 W single core CPU Package Power on Apple is referring to the same thing as any x86 chip and it's CPU Package Power, which can be easily checked by third party tools like HWinfo, or by exposing vendor-provided performance counters like Intel's PMU?
CalmSpinach2140@reddit
Computer base either used powermetrics an internal macOS command or measured at the wall. Those are the only two options on Mac. If you want a more detailed review of power consumption on Mac, I would wait for the M4 Mac review from Geekerwan.
VastTension6022@reddit
...did you read the title? you dont even need to read the article.
basil_elton@reddit
Does it say what the CPU SoC power consumption is during the benchmark?
CalmSpinach2140@reddit
These M4's will also go into Macbook Air which will have no fans and also the base 14" macbook Pro which has a fan. The power consumption we will know when Notebookcheck releases their review but it should be simliar to the base M3 14" macbook pro in terms of power consumption.
basil_elton@reddit
Notebookcheck as far as I know doesn't have power consumption data at the SoC level for Macbooks.
CalmSpinach2140@reddit
They don't no, but they use Metrahit Energy Multimeter for their reviews for idle and load tests. They also test the power consumption with an external display to get rid of display power consumption.
https://www.notebookcheck.net/Our-Test-Criteria.15394.0.html
basil_elton@reddit
I won't get into the details, for I would then have to go look at my reddit history from two years ago - but the TL;DR is
After a long exchange of comments with the former storage reviewer who wrote for a long time at the now-defunct Anandtech, I came to understand that ensuring that you externally don't measure the power consumption of the display by bypassing it and connecting with an external monitor does not tell the whole picture about laptop power consumption.
CalmSpinach2140@reddit
That’s also why NBC measure the laptop maximum load as well. The M4 maximum load should be around the 256V while delivering much higher ST and MT.
Aliff3DS-U@reddit
Because it will eventually go into the MacBook Airs which is also currently fanless.
NeroClaudius199907@reddit
I still have my 202/17 macbooks with fans. They're inaudible
no_salty_no_jealousy@reddit
It's really pathetic how these people just bootlicking someone claim who overhyped that apple m4 while also no power figure showing up.
no_salty_no_jealousy@reddit
The mac m3 pro can use up to 60w at sustained, this M4 pro must be using more watts to than Lunar Lake so the score is not surprising.
lutel@reddit
x86 is dead and it was obvious when M1 was released. Even on desktop newest x86 CPUs give 5% performance increase between generation, at 3-5 times higher power usage than ARM. Still lots of people like to live in delusion that new some new node will bring miracle. It is all about ISA, it dictates how efficent architecture can be.
ExeusV@reddit
Because your CS prof told you so?
Here's take from industry veteran:
https://chipsandcheese.com/p/arm-or-x86-isa-doesnt-matter
theQuandary@reddit
They are simply wrong. Even their own sources don't agree with them (eg, the Haswell paper showing that the decoder used 22% of total core power on integer workloads).
They refuse to address the obvious counter that "if ISA doesn't matter, why did ARM drop their old ISA and do a complete redesign?"
Strawmen like "ARM has uops too" abound (not all uops are created equal nor is the process for getting uops equally efficient).
Worst of all, he avoids the design efficiency consideration. Designing an ARMv8 chip is easier (M3 P-core is almost half the size of Lion's Cove) and the lack of legacy garbage and edge cases means you can go faster without breaking compatibility.
Finally, I was told by the author on one occasion that CPU ISA doesn't matter, but GPU ISA does matter. Both are Turing Equivalent and if it holds for CPUs, it should absolutely hold for GPUs too. Likewise, he absolutely would NOT be arguing that branch delay slots don't matter or that register windows don't matter. Only the bad ideas of x86 get a pass. The more obvious conclusion is that ISA actually matters for both GPU and CPU.
ExeusV@reddit
I'd guess that because the talk is almost always about perf or energy eff.
theQuandary@reddit
Those are related. ARM gets as much of a performance increase in 1 year as AMD gets in two years while ARM spends $1.1B and AMD spends almost $6B. That's a massive advantage and even more incredible when you realize that ARM has tons more products than AMD and those products must work across many different designs and many different fabs and fab nodes.
Snapdragon_865@reddit
Internally they're similar. Remember that x86 instructions are decoded into micro operations
lutel@reddit
No they are not similar, what is great about ARM ISA that can never be achieved in X86 is fixed length instructions which can be very effectively managed in branch predictor. You will never be able to achieve efficiency of ARM with x86 ISA. If you could, you will see phones build on x86, but it never happend, Intel or AMD never designed such efficient chips, because it is IMPOSSIBLE.
basil_elton@reddit
That's a very simplistic view that overlooks the history of x86 development and why it needed variable length instructions - because of how severely limited early CPUs were.
theQuandary@reddit
This isn't true in the general case and certainly isn't true in this specific case. The smallest RISC-V SERV variants are smaller than an 8008 by gate count (not too far from 4004 gate count) and that's with a 32-bit instruction ISA where 16-bit instructions are optional.
The real answer is that they didn't know what in the world they were doing 50 years ago.
basil_elton@reddit
This is a severely distorted view of how engineering works in real life.
RISC-V engineers architect a CPU that uses the accumulated knowledge and experience of hundreds of thousands of engineering-hours over 50 years to come up with what you describe.
Throwaway reddit comment: They didn't know what they were doing back then.
theQuandary@reddit
How is it distorted?
My entire argument is that x86 made sense with what they knew at the time, but it doesn't make sense today and all those bad design decisions have negative consequences for x86 PPA (power, performance, area) while also driving up R&D direct cost to develop the cores and opportunity cost from all the extra time needed for development.
jorel43@reddit
X86 uses fixed length instructions, I don't know what you're talking about.
lutel@reddit
Lol this is simply not true. X86 instructions varies 1 to 15 bytes. They are way more complex and require complex decoding which just can't be made efficient.
BookinCookie@reddit
Efficient decoding on X86 is a solved problem. Today we have uop caches, great instruction-length predictors, and clustered decoding techniques. These make the cost of variable-length decoding negligible.
ExeusV@reddit
theQuandary@reddit
uops being the same is an Intel lie from the 1990s when they spent many millions on this marketing pitch to convince the tech media and corporations not to buy the much better RISC chips on the market at the time.
Further, if you look at identical code x86 often uses 15-30% fewer uops than ARM. This indicates that their "simple" uops are nowhere near as simple as they are claiming.
PeterSpray@reddit
If it's about ISA, then where's Qualcomm?
theQuandary@reddit
Oryon core is 2.55mm2 on N4P and Lunar Lake core is 4.53mm2 on N3B. Given that alone, the PPA of Oryon blows anything x86 out of the water and that's despite Oryon not being particularly efficient (though not so bad as a first-gen product).
I expect the Gen 1.5 in their upcoming phone chip to show far better power numbers and I suspect that the actual gen 2 will make very large gains in both performance and power consumption as they start picking the low-hanging fruit always present in 1st gen uarch.
Strazdas1@reddit
Same place Qualcomm always is - a courtroom.
lutel@reddit
Qualcomm even if it doesn't have as good implementation as Apple it is still way more efficient than x86. Thats why you see Qualcomm cpus in embedded devices, not x86.
Bulky-Hearing5706@reddit
Jim Keller, the man who designed the Apple A series chips, AMD Zen arch, and multiple Intel arch, said that ISA simply doesn't matter.
Rando redditor: ISA is everything
Glad this bullshit got downvoted to hell.
theQuandary@reddit
Jim Keller's full quote isn't what you claim.
If ISA doesn't matter, then why did ARM literally throw out their entire ISA and start over? They aren't stupid and didn't choose to give up backwards compatibility for no reason. Why did A715 increase decoders by 20% and lower decoder size by 75% simply by dropping the old ISA?
ARM32 is nowhere near as bloated as x86, so how much bigger would the difference be there? Looking at physical core size gives some indication. M3 core (excluding last-level cache) is 2.49mm2 on N3B. LNL's Lion Cove is 4.53mm2 (also excluding last-level cache) while also being on N3B. Being nearly twice the size for WORSE performance isn't anywhere near "just as efficient".
The Chips and Cheese article on the topic is simply wrong about all these things and others too. The only paper we have on x86 decode cost examined haswell and found that in normal integer workloads (the common workloads), the decoder was using 22% of the total core power. Why was that never mentioned (even though they link to the paper)?
Another question is about development time and budget. In theory, JS and C are the same (both are Turing Equivalent), but a hobby C compiler can probably generate faster code than a JS JIT backed by some of the smartest compiler writers around (Lars Bak is a legend) and hundreds of millions of dollars invested over 20-ish years.
x86 is a minefield. All the extra space in that massive Lion's Cove core has to be written by designers and validated by validation teams. This takes massive amounts of time and money. x86 bears this out. We see small gains every other year with x86 while ARM sees a similar amount of gains, but every single year instead. ARM designs passed x86 in IPC a half-decade ago (or more) in your phone (only the clockspeed was lower). AMD spent nearly $6B in R&D in 2023 while ARM spent just $1.1B despite offering way more cores validated on way more fab nodes than AMD. ARM designers aren't that much better. Instead, the ISA is better and gets out of the way allowing them to focus on performance instead of legacy garbage.
auradragon1@reddit
Actually, he said it shouldn't matter in theory but it in practice, it does have an effect due to complexity of instruction designs.
HotDribblingDewDew@reddit
This comment is what happens when you think that reading comments and threads on reddit with a sprinkle of chatgpt to help affirm your idiotic thoughts equals an opinion worth writing. Feel a little shame and save whatever shred of dignity that remains. Remove this post and promise to never write pure bullshit and act like it's fact again, please. I'm embarrassed for you.
NeroClaudius199907@reddit
mac sales are tanking
Traditional_Job6617@reddit
I swear if this gen of macbooks don't come with av1 encoding..... its a serious L for apple.
Little-Order-3142@reddit
anyone knows a good place where it's explained why the M chips are so better than AMD's and Intel's?
Famous_Attitude9307@reddit
One reason is that the cores on the M chips are in general bigger, or you wouldn't say wider, more expensive to produce as well, and usually use the newest node. Reason being, apple is the biggest customer to TSMC and gets the best prices. Also, apple can afford expensive cpus because they sell everything as a closed unit, you can't buy the cpu on its own, so they make money by gimping all the stuff they actually have to buy,and still make a huge profit on it.
Look at it this way,if apple was making desktop CPUs, and let's ignore the obvious software, ARM vs x86 and other reasons why this will never happen, in order for apple to make reasonable margins with their CPUs, they would be insanely expensive for just a little performance gain.
RegularCircumstances@reddit
This actually doesn’t explain as much as you would think.
Lunar Lake on N3B is 139mm^2 for the main compute die, and a 4c Performance core CPU complex (including the L3 as this is important for these cores in a similar way Apple’s big shared L2 is) is around 26mm^2 for cores that, in Lunar Lake, are around 4.5-5.1GHz and M2 ST or M2 ST performance + 5-10% at best. And at more power.
Do you know what a 4P performance core cluster is on an M2? It’s about 20.8mm^2 on N5P.
Intel also has a big combined L1/0 now and 2.5MB private L2 for each P core, totaling 10 MB of L2, and 8 or 12MB of L3 depending on the SKU, though the area hit from 12MB will there either way (the marginal 4 is fused off.). In total for a cluster Intel is using 10MB of L2, 12MB of L3, vs 16MB of L2 with Apple.***
So Intel is using not only literally more core and total area area, but also more total cache for a cluster of 4 P cores, and doing so on N3B vs N5P with a result that is at best 10% or so better in ST at 2-3x the power, and modally from reviews maybe 5% better on ST and again, much worse efficiency.
It’s really just not true they’re (even AMD) notably better with CPU area specifically. It looks even worse if you control on wattages — because getting more “performance” by ballooning cores and cache for an extra 20% frequency headroom at massive increases in power is the AMD/Intel way, except this isn’t really worth it in laptops.
***And Apple has an 8MB SLC, that’s about 6.8mm^2 but so do Intel on Lunar Lake at a similar size. Not a huge deal for area and similar for both.
JimmyCartersMap@reddit
Uhhh x86 bros I don’t feel so good
Suspicious_Comedian8@reddit
I have no way to verify the facts. But this seems like a well informed comment.
Anyone able to source this information?
RegularCircumstances@reddit
https://www.semianalysis.com/p/apple-m2-die-shot-and-architecture (M2)
https://www.reddit.com/r/hardware/comments/1fuuucj/lunar_lake_die_shot/ (Lunar Lake with source Twitter link & annotation — you can easily pixel count the area of a cpu cluster)
https://x.com/qam_section31/status/1839851837526290664?s=46
Pre annotated and area labelled Snapdragon X Elite Die
https://www.techpowerup.com/325035/amd-strix-point-silicon-pictured-and-annotated
Strix Point die
Easy. People here just have a very difficult time with their shibboleths, so we’re in year 2024 talking about Apple’s area and muh nodes when AMD and Intel have shown us nothing but sloppiness and little has changed. Lunar Lake on the CPU front would be a joke under any circumstance if X86 software weren’t a thing, because QC and MediaTek can either beat that at lower pricing one way or another or do something similarly expensive and blow them out of the water — even if they’re not as good as Apple, there are tiers and QC + Arm Cortex is clearly in second place on an overall (power performance area) analysis right now, IMHO.
RegularCircumstances@reddit
On the Qualcomm MT thing here is CB2024 from the wall with an external monitor going: notice that Qualcomm can get top notch performance in a good power profile and efficiency, we just don’t know what they look like below 30W or so — would efficiency improve or decline? But either way at 35-45W these things are decent and nearly as they would be at 80-100 and even beat AMD’s stuff at similar wattages. Note this is from the wall, though might not be minus idle so it’s possible the others like AMD especially would do better with that. Either way it’s not that bad.
Notice that the M3 is 50% more performant iso-power than Lunar Lake or matches the MT performance of Lunar Lake around 40-45W at 1/2 the power. This is a part on the same node, nearly the same size (139 for Intel vs like 146mm^2 for the M3) with a 4 P + 4 E Core design, the same SLC cache size, blah blah. Intel also still has a bit more total CPU area devoted to it than the M3 does, afaict.
And it gets just blown out at 20W either way you slice it. Cinebench is FP but integer performance would follow a similar trend here.
AMD Entries:
Ryzen AI 9 365 (Yoga Pro 7 14ASP G9, 15W)
Ryzen AI 9 365 (Yoga Pro 7 14ASP G9, 28W)
Ryzen AI 9 HX 370 (Zenbook S16, 20W)
Ryzen AI 9 365 (Yoga Pro 7 14ASP G9, 20W)
Ryzen AI 9 HX 370 (Zenbook S16, 15W)
Ryzen 7 8845HS (VIA 14 Pro, Quiet 20W)
Intel Entries (SKUs ending in “V”):
Core Ultra 7 258V (Zenbook S 14 UX5406, Whisper Mode)
Core Ultra 9 288V (Zenbook S 14 UX5406, Fullspeed Mode)
Core Ultra 7 258V (Zenbook S 14 UX5406, Fullspeed Mode)
Qualcomm Entries:
Snapdragon X Elite X1E-80-100 (Surface Laptop 7)
Snapdragon X Elite X1E-78-100 (Vivobook S 15 OLED Snapdragon, Whisper Mode 20W)
Snapdragon X Elite X1E-84-100 (Galaxy Book4 Edge 16)
Apple Entry:
Apple M3 (MacBook Air 13 M3 8C GPU)
auradragon1@reddit
People are saying this and upvoting it? Hasn't it been proven over and over again that Apple cores are actually smaller than AMD and Intel cores?
BookinCookie@reddit
Apple’s P cores are wider in architectural width. They’re just efficient with area.
Vince789@reddit
Is that because of better physical layout design? More dense libraries? Or Arm vs x86 (Arm's cores are also smaller despite being wider architecturally)?
BookinCookie@reddit
I don’t know the specifics, but I guess it’s a combination of factors. Lower frequency targets in synthesis, more extensive HD library use, etc. ARM vs X86 shouldn’t make a big difference though.
porcinechoirmaster@reddit
I can take a shot at it, sure. It's nothing magic, but it is something that's hard to replicate across the rest of the computing world.
Apple has vertical control of the entire ecosystem. This means that you will be compiling your code with an Apple compiler, to run on an Apple OS, that has an Apple CPU powering everything. There is very limited backwards compatibility, and no need for legacy support. The compiler can thus be far more aggressive in terms of optimizations, because Apple knows what, exactly, makes the CPU performant and what kind of optimizations to use. They can also control scheduler hinting and process prioritization.
Their CPUs minimize bottlenecks and wasted speed. Rather than being a self-demonstrating non-explanation, I mean that they do a very good job of not wasting silicon or speed where it wouldn't make sense. There's no point in spinning your core clock at meltdown levels of performance if you're stuck waiting on a run out to main memory, and there's no sense in throwing tons of integer compute in when your frontend can't keep the chip fed. Apple's architecture does an excellent job ensuring that no part of the chip is running far ahead or behind of the rest.
They have an astoundingly wide architecture with a compiler that can keep it fed. There are, broadly speaking, two ways to make CPUs go fast: You can try to be very fast in serial, which is to say, going through step A -> B -> C as quickly as possible, or you can split your work up into chunks and handle them independently. The former is preferred by software folks because it's free - you don't need to do anything to have your code run faster, it just does. The latter is where all the realizable performance gains are, because power consumption goes up with the cube of your clock speed and we're hitting walls, but we can still get wider.
This form of working in parallel isn't exclusively a reference to SMT, either, it's also instruction-level parallelism where your CPU and compiler recognize when an instruction will stall on memory or take a while to get through the FPU and moves the work order around to make sure nothing is stuck waiting. The M series has incredibly deep re-order buffers, which help make this possible.
Apple has a CPU that is capable of juggling a lot of instructions and tasks in flight, and compilers that can allow serial work to be broken up into forms that the CPU can do. This is how Apple gets such obscene performance out of a relatively lowly clocked part, and the low clocks are how they keep power use down.
ARM architecture has less legacy cruft tied to it. x86 was developed in an era when memory was by far the most expensive part of a computer, and that included things like caches and buffers on CPUs. It was designed with support for variable width instructions, and while those are mostly "legacy" now (instructions are broken down into micro operations that are functionally the same as most ARM parts internally), but they still have to decode and support the ability to have variable width instructions, which means that the frontend of the CPU is astoundingly complex and has width limits imposed by said frontend complexity.
They have a lot of memory bandwidth. This one is simple. Because they rely on a single unified chunk of memory for everything (CPU and GPU), the M series parts have quite a bit of memory bandwidth. Even the lower end parts have more bandwidth than most x86 parts do outside the server space.
There's more, but that's what I can think of off the top of my head.
BookinCookie@reddit
Apple’s cores don’t rely on a special compiler to keep them fed (in fact, they’re benchmarked on the same benchmarks that everyone else uses, and they still perform exceptionally). Their ILP techniques are entirely hardware based.
trillykins@reddit
I think it's mostly down to apple having full control over the entire ecosystem. They chips doesn't have to be compatible with decades of software, operating systems, firmware, hardware, etc. If they run into a problem, like 32-bit support causing issues or whatever, they will just deprecate it and remove it.
It's like when people ask why ARM is so difficult on Windows when Apple could do it, the answer isn't magic or "good engineers." All of these companies have that shit. The answer is that Windows has an absolutely incomprehensible amount of software and hardware that it also needs to support, whereas Apple by comparison has, like, ten pieces of software and 3 hardware configs.
dagmx@reddit
That doesn’t explain why the performance stays high when run under Linux though.
People like to point to the full stack, but the processors run fast even when not using macOS
hishnash@reddit
Very wide design, lots of cache and aggressive perf/w focus at all points during development.
Being fixed width ARM helps a lot here as not only is the decoder simpler it is easier for compilers to provide more optimal code as they compiler as more named registers to work with. (its much easier for a compiler to break down code into little instructions than it is for to optimally merge instructions into large CISC ones)
Adromedae@reddit
The full stack doesn't make as much difference as people think. A lot of the commenters here just repeat what they have heard elsewhere.
Modern systems are designed with so many layers of abstraction, that in practical terms Microsoft and Apple end up having the same sort of layering and control over their systems software.
The key differentiator in regards to performance is usually due to the "boring" stuff. Like the microarchitecture, the customizations to the node process made by Apple's silicon team, the packaging (the silicon on silicon backside PDN for example, and the on package memory). This is, the stuff that is out of the pay grade of most posters here.
And honestly, a lot of it is due to astroturfing as well. There has been a hilarious detachment from reality when you have posters making up crap where you'd think that Apple had managed to break the laws of physics.
In other words; Apple manages to design and manufacture some very very well balanced SoCs. Which tend to be 1 to 2 generations ahead their competitors in one or several aspects: uArch, packaging, fabrication process.
EloquentPinguin@reddit
The answer is mostly: Having really good engineers with really well timed projects and a lot of money to help stay on track.
It is the combination of great project management with great engineers.
What the engineers exactly do at Apple to make M-Series go brrr on a technical is probably one of the most valuable secrets Apple holds, but one important factor is that they push really hard at every step of the product.
If you and I would know the technical details, so would Intel and AMD and would do the same.
SteakandChickenMan@reddit
With all due respect you entirely dodged the answer. Tear downs of their chips exist and all major companies have access to the same info. It’s a combination of a superior core uarch with a solid fabric and in general a lot of experience making low power SoC infra. Apple is scaling low power phone chips up, everyone else is trying to scale datacenter designs down. And their cores are good.
EloquentPinguin@reddit
One does not simple copy the performance characteristics by watching at a teardown. High ported schedulers, deep queues, large ROBs etc. are all not as simple as "Oh Apple has it X-Wide, so we'll do it to". There is not nearly enough detail public to understand how many of the most important details work of the uarch. And probably the biggest chip companies have more information but it is far from simple.
Like if and to what extends basic block dependencies are resolved in which stages of the core for parallel execution, how ops are fused/split, how large structures support many ports efficiently etc. etc.
Is it just that in this context the question is how the M-Series CPU perf is usually so much higher and so much more efficient than Intel and AMD counterparts and your answer is better uarch and fabric and low power engineering which I think is walking the line of begging the question.
Like what makes their uarch better? What makes the fabric better? What makes their low power experience make the M-Series better? And why should scaling phone chips up be better than scaling datacenter chips down? And why doesn't AMD and Intel not just do the same thing?
SteakandChickenMan@reddit
Intel and AMD need millions of units to sell for a given market segment before they execute a project. They cannot finance IP that is “single use”. They have to share a lot of IP across both their datacenter and client and that intrinsically imposes a limit of what they can do. Apple fundamentally is operating in a different environment - their designs don’t need to scale from 7W - 500W, they’re much more focused in low power client parts.
Obviously I’m oversimplifying, but the general premise holds. You can see this with apple’s higher TDP parts where performance scaling basically becomes nonexistent.
Adromedae@reddit
This would be a good start:
https://www.semianalysis.com/p/apple-m2-die-shot-and-architecture
NeroClaudius199907@reddit
Hardware + software
Dogeboja@reddit
What do you mean by software? Apple is using standard open source clang to compile code to generic ARM target. Not much magic there. Their hardware is just so much better
Pristine-Woodpecker@reddit
The Apple version of Clang/LLVM is not open source. They contribute a ton of stuff upstream, but not everything is. It's BSD licensed so they are under no obligation to do so.
(I'm not claiming Apple's version has magic performance enhancing stuff in their build! You probably get similar or close performance using the upstream, fully open source Clang/LLVM combo)
jorel43@reddit
The operating system
Plank_With_A_Nail_In@reddit
The software being used for the test wasn't made by Apple.
Pristine-Woodpecker@reddit
Hardware alone is more than enough. Clear from the SPEC benchmarks, which only exercise the CPU and show the same lead.
EloquentPinguin@reddit
The answer is mostly: Having really good engineers with really well timed projects and a lot of money to help stay on track.
It is the combination of great project management with great engineers.
What the engineers exactly do at Apple to make M-Series go brrr on a technical is probably one of the most valuable secrets Apple holds, but one important factor is that they push really hard at every step of the product.
If you and I would know the technical details, so would Intel and AMD and would do the same.
996forever@reddit
Multi score is similar to the HX370 in the asus S16 on performance mode(33w sustained). Single core is in another world.
Ar0ndight@reddit
M4 Max will be insane
gunmetalblueezz@reddit
*insanely priced
kukulkhan@reddit
Why do you say that? Premium PC laptops cost just as much if not more than Mac’s.
gunmetalblueezz@reddit
Legion with i9 latest costs less than MacBook Pro 14 m2 Pro base model with 512 gb ssd and 16 gb ram vs 2 tb ssd and 32 gb ram in legion
kukulkhan@reddit
When I looked up the laptop, it seemed more like a gaming machine. In contrast, MacBooks are sleek, quiet, and reliable, designed to run smoothly with minimal noise. Judging by all the vents on the back of the Legion, it’s likely much louder. I also suspect its performance drops significantly when it’s running on battery power.
Oh and let’s not forget that the price is crazy.
Successful_Bowler728@reddit
Render 4h daily on a mac and lets see how much will last.
trololololo2137@reddit
works just fine, 16 inch model has no issues with cooling
Successful_Bowler728@reddit
One of the best Mac repair guys I know said that if you want a mac to last dont let it run too long on heavy things.
trololololo2137@reddit
that's probably a good idea on Intel e-waste
kukulkhan@reddit
I bet you that although MBP arent work stations, the newer m4 MBPs will outperform some desktop and def all laptops. Specially in performance/watt.
gunmetalblueezz@reddit
Yeah I never contested that tbh I have both I daily drive my m2 pro base pro 14 and I love it
kukulkhan@reddit
They’re all tools and I hope people buy them bc it’s the best tool for their needs . I too have a gaming pc and a M1 Max MBP. Hoping to upgrade this year .
Ar0ndight@reddit
"Legion with latest i9" means nothing, the Legion isn't even a premium PC laptop
auradragon1@reddit
Take the Legion laptop out in a work meeting and you'll get laughed out. No one would take you seriously.
diemitchell@reddit
Whatever you say pal
P_Griffin2@reddit
There is probably some truth to it, even if he made it sound a bit harsh.
NeroClaudius199907@reddit
The more you buy the more you save. Nvidia wants apple audience so bad
wolvAUS@reddit
Funnily enough a lot of AI people are buying Mac Studios now. Because the memory is shared, you can end up in crazy situations where you’re allocating 150GB+ VRAM to LLMs.
P_Griffin2@reddit
Mac is generally often preferred in software development.
Strazdas1@reddit
you mean the 5000 dollar apple workstations that have that memory, not what most people think of apple products that come with 8 GB.
aelder@reddit
Yes, they're the cheapest way to get that much VRAM for LLMs.
zofran_junkie@reddit
You can load LLMs in system ram on any computer. You don’t need a Mac for that.
aelder@reddit
I said it's the cheapest way to get that much vram. There's an 8X bandwidth delta between a Zen 5 9950x and an M2 Ultra.
It is cool that you can load LLMs into system memory, but it's not the same thing.
zofran_junkie@reddit
If you’re trying to say that unified memory of Mac Pros is not system memory, then you’re wrong. It’s literally DDR5.
Comparing the 9950X to an HEDT platform is pretty disingenuous as well. Compare it to a current gen Threadripper Pro with 8 memory channels for a more even match.
aelder@reddit
I'm aware of that - what I'm saying is that the memory is directly available to the GPU as vram.
That helps, but it's still about half the bandwidth I believe. Maybe you could build a dual Epyc system and get enough memory channels to speed it up enough.
Of course running a system like that is using drastically more energy and generating more heat. The Mac does all that and sits quiet and cool on a desktop. There's a reason people are buying up the M2 Ultras for this stuff.
zofran_junkie@reddit
You’re right. I did the math and the 8 channel threadripper maxes out around 384 GB/s with DDR5-6000, which is still less than half of the M2 Ultra’s 800 GB/s.
A dual socket Epyc 9124 system can hit just shy of 1 TB/s with all 24 memory channels populated and it would be cheaper than an M2 Ultra system with equal amounts of memory, but most users aren’t going to be comfortable dealing with enterprise hardware. Idle power would be likely be higher too, around 100W-150W with a proper configuration.
wolvAUS@reddit
Yep. I wonder how much $$$ an equivalent NVIDIA card would cost.
Successful_Bowler728@reddit
A 4090 gpu has twice bandwith that M3 pro.
Bakermonster@reddit
About $30k for an 80GB H100 nowadays, so $60k for just the GPUs. It’s common to put them on a machine of 8 GPUs see the DGX H100, which I’ve seen go for $350k. 350k/4 is $88k.
That said, an equivalent Nvidia card is actually more the L40S, which can be slotted into a smaller build if you are so inclined. Each one has 48GBs, so to get to ~150GB you’d need three. There’s no DGX version however, so creating a machine with it is harder to price correctly.
Meanwhile you can get 192 GB unified memory with an M2 Mac Pro for $8.6k. Not nearly as powerful, no CUDA, but if memory is your primary consideration it’s a lot more price efficient.
zofran_junkie@reddit
That’s not VRAM though. That’s regular system RAM being used in the same way that AMD APUs use system RAM. Anyone can put hundreds of gigabytes of DDR5 in a computer to load massive LLMs. You don’t need a Mac for that.
Ar0ndight@reddit
As insanely as any premium laptop, look at a top specced XPS.
no_salty_no_jealousy@reddit
More like insanely overpriced. Typical apple.
996forever@reddit
That too
xingerburger@reddit
Imagine it translated to gaming perf
champignax@reddit
It actually does. Pretty sure it can outperform a 4080
ThankGodImBipolar@reddit
Given that the M2 doesn’t feature hardware tessellation support that’s robust enough for Vulkan/DXVK support, I find that difficult to believe.
no_salty_no_jealousy@reddit
Sounds like comments made by apple cult, always being delusional.
Successful_Bowler728@reddit
Delusional and dumb.
champignax@reddit
I have a 4080 and a M2 Max so not really taking sides here. I know what I’m talking about.
iNfzx@reddit
:D
ExeusV@reddit
By which metric?
champignax@reddit
I ran some resteront benchmark on my M2 Max and 4080 and the gap was not that huge so yeah a m4 max should be able to close it.
GreenMateV3@reddit
source: trust me bro
AbhishMuk@reddit
I guess someone could technically argue that with 96gb RAM their MacBook is faster with a large LLM model than a 24gb vram gpu… but yeah that’s a stretch and comparing cpu with gpus
MaverickPT@reddit
The metric that they made the fuck up
mycall@reddit
Give me 1TB RAM plz thx
Jrix@reddit
I bet it will be insanely complicated too.
Famous_Wolverine3203@reddit
Because AMD has way more threads. 24 vs 10. Cinebench loves threads. The appropriate comparison for AMD would be the M4 pro. Not the M4.
NeroClaudius199907@reddit
Appropriate comparison is already there. M2 pro on 4nm like 370
Famous_Wolverine3203@reddit
M2 pro is 2 years old. It should be compared with Zen 4 then.
Plank_With_A_Nail_In@reddit
All that matters is what is actually available to buy, you make comparisons between the things you can actually have.
yrubooingmeimryte@reddit
It only matters "what is actually available to buy" if you have to buy one or the other at this exact moment.
NeroClaudius199907@reddit
Then if apple has node advantage... Amd should have core advantage i think thats appropriate
Famous_Wolverine3203@reddit
AMD had node parity and couldn’t beat apple woth Zen 4.
NeroClaudius199907@reddit
They did in MT. 370 is faster than m2 pro
Famous_Wolverine3203@reddit
You’re using a microarchitecture that launched 2 years after M2. Why not use Zen 4? Which launched the same time as M2?
NeroClaudius199907@reddit
It was faster
JoeDawson8@reddit
You aren’t comparing apples to apples champ.
maqcky@reddit
Yeah, that's the whole point. We are comparing AMD to Apple.
OK, I'll see my way out.
Strazdas1@reddit
That seems to indicate Cinebench cannot feed cores properly.
Plank_With_A_Nail_In@reddit
No just that the cores of one design are weaker than others.
Qaxar@reddit
Except that the HX 370 is a 12 core processor compared to the 10 core M4 that's on a better node. Chip design isn't why Apple has the advantage. It's the more advanced node and tight coupling between software and hardware.
CalmSpinach2140@reddit
It’s not the tight coupling of software and hardware. It’s mainly the hardware is really good. CPU performance of M1 is the same in Linux as it is in macOS.
When Strix Halo which is N3E as well it will still be behind in ST in CB2024. It is the chip design that makes Apple cores good. Plus the M4 is a 4P+6E with no SMT, obviously the HX 370 will win in MT as it has 24 threads. But Apples single threaded performance is industry leading.
ConsistencyWelder@reddit
Yeah but the AMD DOES offer you more threads. For less money mind you.
It's the same argument with Lunar Lake, it has terrible MT performance for it's price class. "Yeah but it has way less cores".
Exactly. It has way less cores, but it costs $2-300 more.
Famous_Wolverine3203@reddit
You have consistently parroted this LNL arguement multiple times in other threads.
Lunar Lake is great for notebooks because it has way better battery life than Strix, as well as better iGPU performance and efficiency.
As for the HX370, offering more threads than the M4 seems useless anyway since the M4 has the same Cinebench multithreaded performance as the HX370 (33W) while using much less power (20-25W).
And it has much worse ST, much worse battery life and a poorer iGPU than M4.
ConsistencyWelder@reddit
They all have pros and cons, except for Lunar Lake which is just worse all around. Maybe, just maybe it has good battery life, but still worse than a Macbook. At a higher price.
M4 has better ST, but same MT as the HX370. But on the HX370 you can game, gaming is a joke on the Mac, so that's not an option for me.
The HX370 is the best mix of performance, gaming capability and good battery life. And at a good price point. It should be what we recommend to most people, at least if they want to be able to do some gaming.
Famous_Wolverine3203@reddit
HX370 does not have remotely good battery life compared to these two. Wtf. A 50% disparity in battery life is not what you consider good.
As for gaming, Lunar Lake is faster and more efficient, so why would I recommend HX370 over LNL.
From what I understand about your post history, you have an obsession with Ryzen Mini PCs and Ryzen Mini PCs.
Makes sense why you’re going to every thread acting like the HX370 is the best all round product when it really isn’t.
Also Lunar Lake goes for the same price as HX370 laptops. What are you even on about?
ConsistencyWelder@reddit
It's not 50% though.
I would not, and never have, recommended Arc graphics for gaming. Even if it was faster, there'd be too many issues and games that refuse to run. Intel has been making graphics drivers longer than AMD, they've just always sucked at it.
Ok, you're seriously creeping me out here, going through my comment history.
Wth? Lunar Lake laptops typically go for 2-300 more than similarly configured Strix Point laptops. I repeat, wth?
Famous_Wolverine3203@reddit
It is. I literally linked Geekerwan’s review in the previous comment where it is a 50% advantage for LNL and M4 over Strix Point. Did you ignore it or pretend it doesn’t exist?
Seems like a you problem here, friend.
This is again another straight up lie. This is frankly so blatant I have no idea how you had the cajones to type this out and not be called ignorant at best or straight up maliciously lying at worst.
Here’s two Zenbook models.
https://shop.asus.com/us/90nb13m3-m00790-asus-zenbook-s-16-um5606.html?srsltid=AfmBOooLdXHGFpMIyJtz8JGZaf_8pEexNFh0vGsSF12TLDZdVhTrdpP6
https://shop.asus.com/us/90nb14f4-m00620-asus-zenbook-s-14-ux5406.html
The S16 with the HX370 is 200 dollars more expensive because it has a larger screen size. But other than that, the screen res, memory, storage, connectivity are all same.
Tell me how is Lunar Lake more expensive again? Two zenbook models from Asus with only screen size difference between them.
You are literally lying for some reason to cover for AMD. No wonder.
ConsistencyWelder@reddit
Again, you keep posting misleading "facts" and dig into other users comment history. I think you need to recheck your priorities.
auradragon1@reddit
Yes, 7% better battery life but X Elite is 66% faster while on battery lfie.
Source: PC World Youtube LNL review
Famous_Wolverine3203@reddit
X Elite also cannot run any games anywhere near as good as Lunar Lake and has major compatibility issues making it a halo product anyway.
chapstickbomber@reddit
"much worse" lol
Famous_Wolverine3203@reddit
English isn’t my first language. Apologies.
WHY_DO_I_SHOUT@reddit
I think it makes more sense to match core counts, not thread counts. SMT doesn't give you anywhere near 100% MT boost.
obicankenobi@reddit
It makes sense to match the price, actually. You don't really care if one of them has to cost three times more to match the multicore performance.
poopyheadthrowaway@reddit
Yeah, price and TDP (and probably die size) are the things to look at here. Comparing core counts is about as good of a comparison as comparing clock speeds or something.
Quatro_Leches@reddit
I wonder if multithreading is just worth abandoning at this point, Lunar Lake got HUGE single core jump gen over gen by abandoning multithreading.
it made sense when all you had were single core, and dual core, and quad core systems, but at 8 cores, do you really need it? I'm going to say not. if you can get the single core benefits Lunar Lake did, which, in fact, it did obviously.
Strazdas1@reddit
It is. If you feed your cores properly MT will actually decrease performance. And for jobs that dont know how to feed cores you usually just have enough cores extra. MT has already been gimped by security fixes, its no longer beneficial.
Pristine-Woodpecker@reddit
Pointless to make such general statements: the HT uplift is very very different between AMD chips and Intel chips (and yes, in part due to security mitigations).
Pristine-Woodpecker@reddit
I couldn't make any sense of your post until I realized you mean hyperthreading, not multi-threading.
Famous_Wolverine3203@reddit
Lunar Lake’s single core boost has nothing to do with abandoning multithreading.
It was a design decision for a product that runs in the sub 15W ultrabook segment.
When high performance laptops come into play, there is a necessity for good multithreading performance in gaming, productivity etc.,
If by multuthreading, you mean Lion Cove abandoning SMT, that is more to do with die soace concerns.
auradragon1@reddit
Based on Notebookcheck's data on Cinebench 2024 for the Asus S16, the HX370 ran MT at 46.75w.
https://www.notebookcheck.net/AMD-Zen-5-Strix-Point-CPU-analysis-Ryzen-AI-9-HX-370-versus-Intel-Core-Ultra-Apple-M3-and-Qualcomm-Snapdragon-X-Elite.868641.0.html
Universal-Cereal-Bus@reddit
Does Cinebench 2024 do different scoring for desktop vs mobile chips?
Because the scores for the m4 are 40% better than a 14900k or 7950x (both roughly 130 for single thread) which seems... incorrect for a chip that runs 70w max power?
996forever@reddit
That max power isn’t very relevant to the single core score.
InclusivePhitness@reddit
Fucking hell Apple start off by buying Nintendo and Rockstar and launch a new console based on m4 in size of Apple TV. Let us play everything on any device.
demonarc@reddit
Don't think we need a $1000+ console with 512GB of non-user upgradeable storage
theQuandary@reddit
The most likely outcome would be all the Nintendo stuff being made available on every iphone and macbook with an Apple Arcade subscription.
Strazdas1@reddit
More market segmentation is not what we need.
theQuandary@reddit
That segmentation already exists...
JoshRTU@reddit
Nintendo would be a perfect acquisition for apple. Nintendo ip is hampered by its hardware. Imagine every iPhone with a custom Nintendo switch emulator. They would 10x Nintendo’s potential customer base.
hishnash@reddit
wha makes Nintendo succeeded is the constrains. They create novel games, with novel game play I don't know if that would continue if they could just grab the latest apple silicon chip and ship a new console each year.
BadAdviceAI@reddit
It would cost too much. Zero chance this happens.
InclusivePhitness@reddit
What does “cost too much” mean?
BadAdviceAI@reddit
Are you gonna buy your kid a 1,000 nintendo?
trillykins@reddit
I'm not sure why anyone would want this. First, Apple doesn't care about games, and second the console would be $2000.
Qaxar@reddit
Are you not familiar with Apple? They would make all the games exclusive to their devices.
ABetterT0m0rr0w@reddit
Yo, they just wishing upon a star.
Eclipsetube@reddit
And Nintendo is making games for all consoles?
okoroezenwa@reddit
Yeah I’m not sure why anyone would bring up exclusivity wrt Nintendo, that’s their thing after all.
If Nintendo were still in their state during the Wii U period somehow, Apple buying them could probably work (for certain definitions anyway)
InclusivePhitness@reddit
Bro it’s just a wet dream. Their hardware is amazing, and this is coming from someone who owns a 7800x3D and 4080 super, also a gaming laptop with 12th gen intel and 3070ti… of course I won Apple stuff like MacBook Air, Apple TV, iPhone.
Sure if I could run triple aaa games with a Mac mini and also game on the road with a MacBook Pro (same library) I would be in fucking heaven.
jecowa@reddit
If Apple started making Nintendos, the EU would force them to allow side-loading games.
BenignLarency@reddit
You think people were upset by the PS5 Pro? Wait until you see the Apple Nintendo Ultra! $1200 for your console 🤪
jedrider@reddit
We're living in a great age of chip design. Each company has brought good things to the market. Only, Intel has done a face plant on their fabs, but give them credit for almost half a century of design.
super_hot_juice@reddit
Running Cinebench on a laptop is irrelevant to most of the laptop workflows. If you are a cloud user with some Office lightwork you will not see any benefits compared to M1 except faster local writes and reads, SSD will make a major difference. If you are an Office, Lightroom, Filmora user you will see some benefits compared to M1 but you will not extract all of it. If you actually use a laptop to do CPU beauty renders for business, then I don't know what to tell you except good luck you are wasting your time and straining your eyes. If you keep your laptop as your desktop connected do multiple screens to do CPU rendering, then you wasted money and still wasting time.
So Cinebench doesn't tell you nothing relevant really.
max1001@reddit
Y'all so gullible. Apple will never make the M4 that powerful. Not because they can't but because it will hurt their bottom line.
They don't even need to beat Intel or AMD 8n any benchmark. Ppl buy mac for their ecosystem and OS. Not because of the benchmark scores.
hishnash@reddit
Why would it hurt apples bottom line?
The people upgrading to M3 Pro and Max are not upgrading for single core clock speed (as the signal core close speed is the same on all of them)>.!!!
The reason you get a Pro or max is for more memory, more GPU grunt, more displays, more TB and maybe in a very small number of cases more mutli threaded perf.
But single threaded pref is the same across the line as they use the same core design running at the same clocks with the same memory clock speeds.
max1001@reddit
Apple doesn't do huge generational uplifts. They are only competing with themselves. They are not competing with AMD or Intel.
hishnash@reddit
They are competing, in the main market that matters (laptops).
Genration over generation apple is Pushkin 10 to 15% each time (a good bit bette than intel and recent AMD updates).
max1001@reddit
Like you said , 10-15 percent bump, not the 20 percent cited in the article. M3 could have been way beefier considering they went from 5nm to 3nm but choose to make just 10 percent faster.
hishnash@reddit
The 4nm that the M3 used was very early gen and while the transistors were smaller the space bweten them was not much smaller than 5nm. (so you can make smaller transistors but in the end you cant get that many more of them within the same area).
Also the yields one first gets 3nm was not that good and yields are non linear with chips size so making a wider core design would have ment a larger chip with much lower yields (massive increase in cost).
Successful_Bowler728@reddit
Crushing in dreams. Test the thing on blender or photoshop and we ll see its a scam test.
hishnash@reddit
This is a single threaded test, and it will have very good results in blender (cpu single threaded). Photoshop is more mutli threaded so yes expect a much higher core count part (like a M3 max or M2 Ultra) to out-perform it for sure.
Successful_Bowler728@reddit
I want to see a video test showing progress bar on photoshop. Too many suspicious number charts.
BadAdviceAI@reddit
It is on 3nm node, so pf course it’ll be faster. Make the other two ob 3nm and they will likely be competitive.
hishnash@reddit
Well the fact is the others are not and that node and will not be there for the next year+.
Being on a breaking edge node for a large chip is not easy, its not just a matter of selecting it in a drop down, you need to put in a lot of up front work to ensure your design is much more robust to yield issues (otherwise your not going to be able to run it at the speed you want)...
why can apple use these node years before AMD and Intel? well apple have MUCH higher IPC they have opted for a much wider core so they can have high single core preofmance without trying to push extremely high clock speeds (and related voltages).
Neither AMD not Intel could fab thier current designs on these nodes and have the clock speeds they would need (at volume production) to be able to compete.
BadAdviceAI@reddit
AMD just released 3nm server chips that spank anything that apple can offer including other arm cpus.
The point is, that the m chips aren’t that impressive.
hishnash@reddit
Yes but only with the alternative (comact) core design (that is clock limited). The single core speed of those server ships is likely about half that of these M4 chips. Those server chips will spank the M4 in mutli core since they have 100s of cores and the M4 does not.
AMDs high clock rate core designs are not compatible (do not have good enough yields on) 3nm.
BadAdviceAI@reddit
Lol. The new EPYC chips literally smash them and theres no comparison in performance per watt.
M4 is a cool chip, but its mainly got a node advantage.
hishnash@reddit
Since core on the 3nm ethics (that are using the AMD C cores) is very poor (worse than apples e cores) and draws more power than apples e-cores.
C core only cpus will have better (mutli core filly separable problem) perf/we than P cores but would be a lot worce than an apple e core only part.
As is said the advantage here is the core design lets them use the latest core, AMDs high performance (clock speed) core cant be made in good yields on 3nm for at least another 12 months if not 24 months.
This is not a matter of selecting 3nm on a dropdown menu when ordering the parts, that is not how silicon design works.
Quatro_Leches@reddit
how does the M4 compete GPU wise? it has a weaker NPU than either others.
hishnash@reddit
Measuring NPUs is very very `subjective` we will need to see how well it runs given models but given that there is not even a industry stander for what 8bit int or 4bit or 8bit float even is on an NPU TOPs are not at all comparable.
TwelveSilverSwords@reddit
In terms of GPU, M3 matches LNL in performance and slightly beats it ij performance-per-watt.
Source: https://youtu.be/ymoiWv9BF7Q?si=kSHclVmd7DatXjNY
auradragon1@reddit
Your link doesn't show that.
VastTension6022@reddit
14:47 – seems about right
auradragon1@reddit
It shows that M3 GPU is 25% faster at 25w.
How does 25% = match?
VastTension6022@reddit
They're similar at peak performance, the 25% when power limited is where it beats it in perf/w?
auradragon1@reddit
Actually, if you look at 16:12, he has the data in graph format.
LNL: 3246 @ 30w
M3L 3383 @ 20w
The way I see it is that LNL is slightly slower than M3 while using 50% more power than M3.
SERIVUBSEV@reddit
It has correct amount of NPU actually. Every Windows laptop used to have similar until MS made 40 TOPS minimum requirement for "Copilot+PCs" and Recall feature.
At 40-50 TOPS NPUs currently have similar size on die as the main CPU, which is ridiculous.
Plank_With_A_Nail_In@reddit
What exactly are you going to run on the M4 GPU?
Dangerous-Fennel5751@reddit
Games for example. Whisky works very well.
Famous_Wolverine3203@reddit
It should have the fastest 3D rendering performance of all iGPUs in the desktop market and the second slowest (better than X Elite) gaming performance of all these iGPUs.
moxyte@reddit
Very cool but witless it get rid of that dumb notch on display?
hishnash@reddit
you mean increase the top bezel? Why would you do that? That would reduce the usable display area since macOS puts the file menu bar top of the screen (not top of your window) so the notch is only an issue for apps with many many items in the file/edit menu (not common) and they flow around the notch not under it.
jorel43@reddit
What's the point of these comparisons, Apple doesn't have to worry about anything being compatible with their solutions, everything can just be blown away and redone in the next generation and you have no backwards compatibility. Also they still don't have any utility since x86 software doesn't work on it. Apple has something like 14% market share and that's for a reason.
hishnash@reddit
Apple has very good backward compatibility, x86 software does work rather well, apple is not MS, MS screwed up but apple have done enough CPU transitions to do this every well.
JohanKeg@reddit
"x86 software doesn't work on it"
What? Most of Adobe apps I used were still on x86 and it was translated via Rosetta, they worked and I still have lots of apps that are not native on ARM. Thats plainly false.
auradragon1@reddit
All Adobe software on Mac is ARM native except for 3 niche ones. https://helpx.adobe.com/sg/download-install/kb/apple-silicon-m1-chip.html
Why are most of your Adobe apps still using translation?
JohanKeg@reddit
They werent when I bought into M system post 2020.
Substance 3D suite had its Native update in 2021? I think.
Elios000@reddit
i welcome our new ARM overlords. id love to see what an ARM chip at like 200w could do. right now the M4's kicking crap out of x86 with both arms haha tied behind its back. lets see what ARM chip with 250w thermal and no power limit can do
hishnash@reddit
Depends on your use case, for single core perf there is no point pulling back that power as your not going to get much of an improvement as the silicon becomes very non linear in perf compared to power once you tart overlcokcing like that.
For mutli core just look at the server space with 512 core server chips.
max1001@reddit
You don't need to wait. All the cloud providers have arm based server chips.
GarbageContent823@reddit
Apple hardware cannot handle \~10 Billion Raytrays per second.
Not even aMd or Nvidia GPUs can do this.
Imagine doing 10 Billion Rays per second in something like Cinebench. Oh yeah.
lord of the Rings movie is a joke against this result.
Helpful-Artist-9920@reddit
apple will never be the premier gaming platform
hishnash@reddit
There is more to the world than gaming.
mi7chy@reddit
Looking forward to buying a new laptop just to run synthetic benchmarks.
hishnash@reddit
it will be rather good in regular day to day tasks as well.
pianobench007@reddit
How useful is the cinebench result for media heavy users?
What is a better benchmark for gamers?
And what about youtube and powerpoint/excel heavy users? What is a good benchmark for them?
How's about CAD heavy or 3D modelers that don't render a scene but instead work on heavy models with tons of vectors?
Is everyone in r/hardware a media heavy user who uses cinebench only? I am genuinely curious as I see this benchmark a ton. But it doesn't reflect my real world gaming usage.
Please help.
Sopel97@reddit
everyone in r/hardware is an armchair 24/7 youtube watcher who eat up synthetic benchmarks to feel like they are using their computer for something
auradragon1@reddit
Funny because people claimed that Cinebench is better than Geekbench because it's not synthetic. Now you're saying it is.
Sopel97@reddit
imo geekbench could have been more relevant because it's based on real-world workloads, but in the end it fails miserably because it aggregates the results over too wide spectrum of software for the final score to be useful. Why someone would consider geekbench synthetic is a bit beyond me.
auradragon1@reddit
You do realize that SPEC also aggregates results right?
Sopel97@reddit
yes, that's why I don't consider it relevant
Pristine-Woodpecker@reddit
I don't understand your argument. My best guess is that you think the subtests shouldn't have equal weighting. But that seems like an extremely POV argument.
Sopel97@reddit
different weighting would not make it more relevant, no. What would make it more relevant is scores for individual software with clearly defined workloads.
Pristine-Woodpecker@reddit
But SPEC already has that?
Sopel97@reddit
he asked about the aggregated score of SPEC
Plank_With_A_Nail_In@reddit
What games can you play on a Mac? If you are thinking of buying a Mac to play games you are dumb as a rock.
notam00se@reddit
~70 games in my steam list are available in macOS. Everything I've played in the last 3-4 years are playable, but I stay away from AAA/EA/FOTM gaming.
obicankenobi@reddit
Been using Cinebench to guess my performance in both V-Ray rendering and CAD performance in Rhino 3D. Multicore performance usually tracks perfectly with what I get in V-Ray while single core performance matches what I get while using Rhino 3D, whose operations are mostly single threaded, so, Cinebench is actually quite a good indicator for what I'm going to get.
Sopel97@reddit
why would you use a CPU in v-ray?
obicankenobi@reddit
It is more reliable, GPU gives all sorts of errors and may decide to run very slowly for whatever reason. Also, your scene has to fit to the GPU memory, otherwise it won't render at all.
Also, V-Ray GPU vs. CPU isn't exactly the same, very easily noticable if you have some frosted glass kind of materials in your scene, the GPU engine renders those very badly.
I use the GPU to render all the time but occasionally, I have to fall back to the CPU.
Sopel97@reddit
I've seen some issues with the GPU renderer historically but thought they got resolved eventually to get similar performance ratio as for example Blender's Cycles renderer achieves. A bit of a bummer, thanks for clarifying.
obicankenobi@reddit
Best part of Blender (and Cycles specifically) that it gives you the exact same image whether you render on CPU or GPU. However, due to my workflow, I'd still rather use V-Ray on Rhino so that I can see my rendered results in real time as I work on the design.
trillykins@reddit
It's kind of useless. You need to find real-world benchmarks for your specific purpose.
CalmSpinach2140@reddit
Cinebench 2024 is useful for people who use Maxon's Redshift renderer. The base M4 is also great at Office, M4 also has excellent web browsing performance for that you can use Speedometer and the adobe apps like Photoshop and Lightroom etc are super optimised for Apples chips. Same goes for video editing software like Resolve. For tasks like these the M4 powered Macbook is great.
For gaming I would stick with x86 and other niche x86 only applciations I would pick up an x86 laptop.
996forever@reddit
Gaming usage you can look at individual gaming benchmarks, there are no shortage of that.
no_salty_no_jealousy@reddit
Look all these toxic apple cultists and stock holder downvoting everyone who says bad thing about apple even though it's the truth. This sub became cancerous because those jerk fan bois. It's getting really pathetic !!!
eriksp92@reddit
I don't understand how we can be reading the same sub - the toxic ones are almost exclusively the Apple haters who refuse to accept when Apple does something impressive, and who uses every article to yell about their grievances, rational or not.
okoroezenwa@reddit
And that person is almost always in those threads. It’s hilarious, especially given their user name.
no_salty_no_jealousy@reddit
Look all these apple toxic apple cultists and stock holder downvoting everyone who says bad thing about apple even though it's the truth. This sub became cancerous because those jerk fan bois. It's getting really pathetic !!!
Dependent_Big_3793@reddit
it is very bad for lunar lake, alt least hx370 not using battery life for selling point, it can be pair with gpu and strong MT performance for gaming laptop, even not pair with gpu it still provide good graphics cpu performance with lower price. macbook always using battery life for selling point, i think it provide better cpu performance with same battery life as lunar lake. lunar lake very hard to compete with m4.
Stark2G_Free_Money@reddit
If they would just finally make a potent gpu that can run with thungs like the 4090. the macbook would be perfect
itastesok@reddit
They have a potent GPU. That's not the issue.
Stark2G_Free_Money@reddit
Hey, i have an m3 max macbook pro myself. I know their top of the line gpu pretty well. Its nice and all. But pushing pixels at 4k is not really a great endeavour with the 40 core gpu on my m3 max.
I have first habd experience with it. Trust me it kinda sucks. Especially for the price of nearly 5000€ i paid for it. At least compared to what windows laptops offer at this price range
no_salty_no_jealousy@reddit
Meh, no power test. The mac m3 pro can use up to 60w at sustained, this one must be using more watts to than Lunar Lake so the score is not surprising.
no_salty_no_jealousy@reddit
Meh, no power test. The mac m3 pro can use up to 60w at sustained, this one must be using more watts to than Lunar Lake so the score is not surprising.
RegularCircumstances@reddit
The M4 in an iPad when short run and cooled used about 25-30W for MT, which is totally within Lunar Lake’s ballpark for MT turbo.
It’s an M4, not an M4 Pro. It’s just in a MacBook Pro.
lol