Intel CEO Lip-Bu Tan stamps out chip bugs with aggressive new quality standards, says major validation errors can result in termination — 'B0, you keep your job. Anything above that, you are fired'
Posted by CopperSharkk@reddit | hardware | View on Reddit | 206 comments
gumol@reddit
Ah yes, that'll surely create a healthy work environment. There's a reason blameless postmortems were invented.
260X@reddit
After the RPL disaster that turned out to be a PR nightmare for Intel and resulted in Gelsinger's 'early retirement,' I can't fully blame a new CEO for creating a toxic environment to keep the production quality in check.
imaginary_num6er@reddit
Gelsinger was kicked out because of his charity GPU product line
Exist50@reddit
Nah, it's the IFS side that got him canned. If anything, investors were mad he threw all their money into the foundry black hole and missed the AI train in the process.
laffer1@reddit
Intel didn’t miss the ai train, they just sucked at it. They had ai accelerators and the miss happened before pat was there
Exist50@reddit
Part of the same. The point was Pat was punished for not having a real product for AI. He even better against the boom we now see with agentic AI.
laffer1@reddit
Intel had gaudi before pat joined again. They had a product.
Exist50@reddit
People weren't actually buying it.
Grizknot@reddit
so many contradictory comments here, He was fired bec he created a GPU division, no he was fired bec he didn't create a GPU division, he was fired for making bad CPUs no he was fired for not making enough bad CPUs
Geddagod@reddit
It did look like he created a GPU division.
Problem is that they couldn't actually ship the stuff from that division. Delays and issues. So in a sense, it's a double whammy. It might genuinely have been better if he didn't create the division at all and waste resources doing so, than to have done so and failed.
Yes, both of those are true, and aren't necessarily contradictory.
Even with bad products, Intel has typically flooded the market to keep unit share, like we saw with DCAI in the past. Intel still ships tons of 'bad' products.
The problem is that Intel wasn't doing that with Intel 3 stuff, since Pat assumed they would have a bunch of 18A volume and products out by now, or very soon.
So really Pat messed up here in 2 ways, one not predicting the AI CPU boom (to be fair, did anyone else really do that either?).
And two, planning node capacity in such a way that they aren't able to supply their worse CPUs in a supply shortage, at least without more lag time than necessary.
Reducing Intel 7 capacity though is definitely more excusable though, it's really how they planned for Intel 3 and 18A that should be criticized.
Vigilant256@reddit
I can understand you didn’t see the ai boom , but why couldn’t you pivot it fast enough. I doubt AMD forsaw the ai boom as well but they quickly pivot towards it and have a product as well.
Exist50@reddit
I mean, it seems simple. The two main reasons he was fired were overspending and underdelivering on Foundry, and failing to have a meaningful product offering for the AI boom.
Exist50@reddit
RPL is the least of their problems. Looks at SPR. Look like 15 steppings to get out the door, which delayed it 1-2 years. That's what they want to avoid.
Granted, that was largely because BK had the bright idea of laying off Intel's server pre-silicon validation team. After all, quality was so good, who needs them!
Demian52@reddit
Funny thing about that, I was on DMRs validation team and got laid off after training on its architecture for a year. In fact, a ton of the validation staff on that project were cut. History repeats itself I suppose.
Seanspeed@reddit
Gelsinger wasn't sacked cuz of some 13900k's having issues. lol
R-ten-K@reddit
No. Gelsinger was sacked because he had shown a pattern of consistently missing major trends and the board, and most large investors, lost confidence in his tenure (and rightfully so).
thegammaray@reddit
Is that really why? To me it always seemed more likely that the Falcon Shores failure/cancellation got him canned.
Geddagod@reddit
I mean Gelsinger warned everyone that it was a long term project. The problem wasn't the lack of financial rewards yet, it was the lack of... any sort of "rewards" yet. No customers, no process leadership, and delays.
spicesucker@reddit
At the same time “No Blame” culture has largely been replaced by “Just Blame”, they’re similar but if you should have known better you’ll still get it in the neck
Panaka@reddit
“No Blame” culture has always had a carve out for negligence. This is just MBAs rebranding a decades old concept as something new.
Free-Competition-241@reddit
You hear accountability and call it blame.
nittanyofthings@reddit
But "just blame" is the ideal right? Tan's problem is he is using the end result as the determinant. You can do the right thing and still have it not work out.
Exist50@reddit
How?
Far_Piano4176@reddit
i think they interpreted "just blame" as "being just in the assignment of blame" rather than "only blame", but the latter is what spicesucker meant.
Free-Competition-241@reddit
Making computer chips isn’t about cuddly wuddly post mortem meetings.
“Hey Bob, we don’t want to point fingers, but it turns out the space shuttle blew up due to a bug you introduced.”
whyte_ryce@reddit
It’s not completely wrong. Intel engineers got too used to having a captive fab and having projects with like 8 steppings. Not enough reliance on quality pre si to stamp out bugs, in the distant past at least. Lots of PM mishandling and politics over not dropping features that weren’t ready to go on A0 and would be enabled for the first time on B0. Apparently one of the things Keller tried to stress was that this isn’t how the rest of the industry worked and they needed to shift to a “if it’s not ready it doesn’t make the cut” mindset
Lip seems to have just taken that to an extreme
Exist50@reddit
It's a complicated problem, but Intel's historically understaffed in pre-Si validation. I mentioned in another comment that Intel's server troubles can be traced in large part to Intel largely laying off the server pre-Si val team under BK, but even on the client side, their staffing is more like 1:2 dev:val vs something like 2:1 at the likes of Apple.
Also, the fab might have enabled more steppings for "cheap", but they also drive additional ones themselves. No other fab would introduce so many late, design breaking changes and expect the design team to keep up.
whyte_ryce@reddit
Intel brought in a lot of people that actively disliked validation and thought validation spent a lot of time doing science experiments and finding useless bugs no customer would actually see.
max123246@reddit
Validation work almost always gets paid less because of archaic ideas of who the "value producers" vs "cost centers" are at a company. Complete bollocks but you see people denigrate support engineers and infra to this day
Exist50@reddit
IVE? Heard they often got the short end of the stick.
Remarkable-Deer-6721@reddit
You are kissing the hand of your slaver after he crushed your theets. Disgusting
hollow_bridge@reddit
expect more of this, the new policy will significantly increase slowdowns.
Capital-Froyo-4359@reddit
Not sure why you're acting like getting fired for doing a bad job is some crazy idea. Where have you ever worked that people never face responsibility?
Exist50@reddit
The question is who gets the blame. Design for writing the bug? DV for not finding it? Arch for feature complexity? If one's job is at stake, a lot of effort will be spent assigning blame.
SlamedCards@reddit
Lip-Bu addressed in during the interview that he only wants to hear bad news. Share problems so it's everybody's problem. It's when you hide things that you get canned
Exist50@reddit
That's good. And I do think he doesn't mean this statement as something ICs should worry about. Project managers, maybe, but then that's as reasonable a point as any to assign blame.
capacity04@reddit
Eh, not sure this is a good take. There's a balance. If you want to run an industry leading semiconductor company you need to have high standards
elkond@reddit
Lip Bu fired people on the IC levels and hired VPs. So even if u try to give it a generous read, his hiring practice makes it impossible to make it a reality
Vigilant256@reddit
This statement is not fully accurate, lip bu removed quite a number of VPs and very high level ICs . These high level ICs do not do much engineering and more on powerpoint slides .
elkond@reddit
dude comes in. says "im flattening the management structure". proceeds to fire ICs, lowest level managers (who were effectively ICs too), ye some VPs were let go too, but number of those hired outweighed the fired ones (and i saw far more saying "ye i took position at X, rather them getting the boot)
those high level ICs now are core people at companies like Cerebras. this is just an old man doing old, "strong man" management style disaster
Vigilant256@reddit
No the VPs who were let go were definitely more than hired. Let go here means those that say “retired” and those that knew they are going to get the axe and quit to go another place.
Intel VPs were bloated , in 2024 they had like 120 VPs . In comparison amd only has 20+ VPs . Apple had like 40+ VPs . For some reason pat thinks putting more management can solve the problems.
W0LFSTEN@reddit
How many VPs? In which areas? What was his rationale?
elkond@reddit
ccg software engineering, more than 10, fuck if i know
hollow_bridge@reddit
Timing is completely wrong, if they did this last year fine; with the current bubble profits this is just going to make employees want to leave.
mattybrad@reddit
This is how a lot of high paying jobs are. They pay you a ton because you need to do hard things. If you can’t do the hard things, then what are they paying you for?
doscomputer@reddit
seems like everyone forgot about the defective 13900k/14900ks
DeuzExMachina_@reddit
The same type of environment that got Intel into its current state under Krzanich
gburdell@reddit
Apparently you never heard of Andrew Grove
Immediate_Fig_9405@reddit
I would sell all stocks
Wisniaksiadz@reddit
I dream one day whole world will just start using poka-yoka and make systems where human error don't break shit up
lukfi89@reddit
Beatings will continue until morale improves.
mennydrives@reddit
Honestly, they deserved this. Intel's designers have basically forced the fab team to fix their fuck-ups for years.
Now they have to play catch-up to AMD 'n Nvidia.
W0LFSTEN@reddit
Morale is quite high at Intel... If you have RSUs. Not sure how the plebs are feeling though.
wrhollin@reddit
Well, the techs would feel better if you didn't call them plebs.
W0LFSTEN@reddit
The plebs will grow to understand their place
INITMalcanis@reddit
"Don't report anything above a B0 issue, got it"
PilgrimInGrey@reddit
Expert chip designers in the comments. Industry standard is an A0 tapeout. It means highest quality and wide pre-silicon validation coverage. SPR went until C0. Everything after MTL have been B0 steppings. LBT is enforcing what industry follows within Intel. Of course whiners here have no idea how the industry works.
R-ten-K@reddit
FWIW “A0 tapeout” is common industry shorthand, but not an "industry standard."
Most major vendor have their own proprietary validation, bringup, and FA flows, etc. along with their own internal terminology, release criteria, and stepping definitions. A lot of that development and debug process is kept very close to the chest.
There are some common conventions, however, an NVDA stepping label, validation milestone, or debug flow should not be assumed to map cleanly to Intel’s or AMD’s. The same terms can carry very different implications depending on the company, product line, and internal process.
ACuriousIdiotDev@reddit
Thanks Claude
PilgrimInGrey@reddit
Thanks for trying to explain my work to me
R-ten-K@reddit
you're welcome.
PilgrimInGrey@reddit
You did a really bad job anyway
R-ten-K@reddit
K.
LOL
gburdell@reddit
I've been involved in a shitload of tape-outs at a few different companies and the best I saw was B-1 (a couple of test chips -> A0 -> B0 -> B1)
R-ten-K@reddit
Your experience track. Studies show 80-90% of ASIC/SoC programs still require at least some level of respin, errata closure, package/process tuning, etc after bringup.
First silicon success is rare for any design of moderate complexity, regardless of team/organization. I have no idea what some commenters were smoking.
Even on some of the most aggressive execution-focused teams I’ve worked with, nobody realistically assumed A0 would pass. There’s a reason there are so many packed daily flights between Taipei and the Bay Area during bringup season lol.
ElementII5@reddit
So the question arises is doing process validation through production silicon smart or lazy?
I guess TSMC using test chips is unavoidable as they are a pure foundry. Intel doing it through production silicon on one hand dials in the process to their product but also constrains the overall broadband applicability for external customers?
hwgod@reddit
SPR went well beyond C0.
Absolutely not B0. B-step, maybe. If you ignore the interleaving steppings for -U vs -H or the steppings for ARL-20A.
PilgrimInGrey@reddit
I was on SPR
Geddagod@reddit
Seeing how SPR went, makes sense.
CopperSharkk@reddit (OP)
ARL LNL and PTL shipped at B0
hwgod@reddit
B-step, yes, but I don't think B0. Intel doesn't also publicize every stepping they make (they use steppings as more of a public versioning system), but sometimes it shows up as a suffix.
Fr0stCy@reddit
You’re right. A0 to production happens when management properly commits resources for Formal Verification.
Also important to define what is an acceptable risk and what isn’t, because you can’t catch everything in FV.
jedijackattack1@reddit
A0 is a hopeful standard. But any high risk product or product with a large number of new features b0 tends to be the expected with a0 being a massive win if its possible to do it. I do love my a0 firmware workarounds.
darknecross@reddit
Agree, sometimes we weren’t even feature complete until B0…
onnie81@reddit
Haswell was stepping 2 H4, that transactional memory bug didn’t live well
OttawaDog@reddit
Next he will fire anyone that takes extra time to get a chip project done.
Fired if you don't take extra time to find bugs, fired if it takes you extra time to find bugs.
Raigarak@reddit
Yeah, not like TSMC makes their employees be on call 24/7 and probably have an even stricter policy.
KellyShepardRepublic@reddit
They stick to making the chips though, not the whole stack. That allows them to not care for their employees as long as management is competent in the role.
iSGAFF@reddit
Read this as “Incel CEO”. Was very disappointed. Twice.
Capital-Froyo-4359@reddit
Diamond Rapids delays gotta be costing them Billions. I can certainly understand the concern.
Exist50@reddit
Well that isn't quite the same thing. If you insist on few steppings, a natural consequence is taking longer to ensure quality on the first stepping. Delays are fundamentally about mismatches between expectation and execution reality. Whether that manifests as more steppings, pushing A0/B0, more bugs, or all of the above, the root problem is that mismatch.
Vigilant256@reddit
Then they’ll compare your teams capability with the competition. If the competition can do it with less headcount, why can’t you?
Interesting-Rock2474@reddit
Should you find out that the competition is more capable it does not chance your capability. Should you want to lower the number of steppings your validation needs to be better.
You can not simply say look at the competition they can do it in x (time/stepping's) so we should also be capable of that. Improvement comes when you examen your processes and try to improve them.
Intel has had multiple problems for silicon validation for example(non exhaustive), lots of costum silicon not synthesised, general lots of design creep/changes, program for chip design validation worse than industry standard
Vigilant256@reddit
Then fix it. If you have more VPs, more fellows, more management then why can’t you fix it . And why are your processes behind your competition despite being established 10-20 years earlier than your competition.
Minced-Juice@reddit
The "delay" is referring to 384 core and 512 core variants which weren't known to exist till the rumors of this delay started appearing.
Before this there was still confusion on whether DMR would be 192-core or 256-core.
Geddagod@reddit
Depends on what the root cause of DMR delays are. Maybe Intel throws the packaging team under the bus publicly like they did for CWF.
callmedaddyshark@reddit
If your kids are never allowed to make mistakes, you don't raise perfect kids, you raise liars
ResponsibleJudge3172@reddit
Ap, A1 and so on are mistakes. B0 major revisions
Anything after? Well Who knows. But you completely kill cadence with that. Find and sort any major bug in the B0 is likely the target. Don't do piecemiel prep launch steppings
Pretty sure someone pointed out was it either H100 or A100 were A0 for comparison.
Exist50@reddit
B-step is more the standard for complex SoCs, but Nvidia does often ship on A-step for their GPUs. Not always, but often. GPUs tend to be more amenable to sw/firmware workarounds, however.
max123246@reddit
A good litmus test would be to checkout what step Nvidia shipped Grace-Hopper since that was a CPU
Exist50@reddit
Well, it's 2 separate dies. So each one would have its own stepping. Not sure if anyone's reported on the stepping for Grace though.
max123246@reddit
Yeah, I was more saying whether the CPU Nvidia developed was done faster than Intel CPUs, to rule out the GPU chip being maybe easier to hide defects
ResponsibleJudge3172@reddit
To be fair, the mediatek CPU-GPU partnership (who I guess Mediatek is responsible for CPU side) is likely past B stepping and may be at C or more from the delays and rumors of showstopping bugs
Jonny_H@reddit
I feel that's more based on the accuracy of their simulation and emulation, and ability to "absorb" and workaround bugs, than anything else.
There's a lot of things that can be "hidden" by a closed graphics driver and shader compiler, after all. Then the only reason to re-spin is if the cost really outweighs the benefits of that particular feature or instruction.
max123246@reddit
Hopper was a pretty big mistake needing software emulation to regain accuracy lost by the hardware implementation for example
Jonny_H@reddit
I really mean emulation of the hardware design before it's actually put into silicon rather than a "possibly incorrect design focus decision" like that.
Possibly simply because I was somewhat separated from that level of decision making - by the time I saw things that was years ago :p
Exist50@reddit
Yeah, beyond just relative complexity, CPUs are kind of screwed in this regard because, unless you're Apple, the software contract is at the ISA level. You can use microcode, but it's not as powerful a tool as a full on driver stack.
max123246@reddit
Also Nvidia's ISA, PTX is not the actual assembly, it compiles to SASS which is a per GPU architecture assembly
ResponsibleJudge3172@reddit
Nvidia also apparently just left 2 TPC non functioning in their AD103 leaving it at 80SM compared to not only rumors, but even the files that were hacked saying 84 SM for AD103 just to prevent delays.
Maybe Lip Bu Tan would be the same. For example, Intel 12th Gen couldn't get AVX512 working. Instead of workarounds that delay things. Disable it from the start and cut off the feature early. Do the final revision by the B0 spin.
RailFan65@reddit
Thankfully the engineers working at Intel are not kids and can understand adult things like high stakes and doing your job correctly to get paid.
W0LFSTEN@reddit
I think Intel is less concerned with raising perfect children and more concerned with having highly technical and competent operations.
By the way, how big of a mistake would an E5 stepping be in children world?
callmedaddyshark@reddit
Consider this scenario: John is human, got unlucky, and introduced a huge bug. It is in Intel's best interests for remediation to start immediately, to limit the damage. However, John knows that when someone finds it was his mistake, he'll lose his job. Here's John's plan:
intronert@reddit
Nailed it!
azn_dude1@reddit
That doesn't explain how someone "getting unlucky" and introducing a huge bug results in an E5 stepping
W0LFSTEN@reddit
He found a bug he introduced after tapeout? It doesn’t really matter what he does after that point, it’s not his job to validate the chip. Out of his hands completely. He can come clean, or he can just wait for someone else to find his colossal fuckup.
Let’s say you’re a doctor. You accidentally leave your medical scissors inside someone’s chest cavity during open heart surgery. Dr John knows that when someone finds god mistake, he’ll lose his job. Here are Dr John’s plans:
Hide the scissors as long as possible
Half ass his job, spend most his time arguing in medical court that it was actually one of many nurses, interns and assistants that did it
Become a farmer
SmokingPuffin@reddit
It's roughly a baseball team full of mistakes. You can't get this level of failure in any one mistake.
b3081a@reddit
B0 is the chance to fix all your mistakes after numerous A* steppings. AMD has been shipping CPU/SoC chips including major changes with B0, and minor changes with A1 or even A0 for years, and there's nothing wrong about this approach.
Vushivushi@reddit
B0 is plenty of tolerance for mistakes?
Rather than encouraging liars, it encourages teams to be honest and realistic with their goals on the get-go.
Arch-by-the-way@reddit
Redditors ask for higher standards then complain when they learn what higher standards look like
jaxspider@reddit
No one in the tech industry hates high standards. There is a way to communicate your message without sounding like a dickhead. You'd think the CEO of one of the two biggest CPU manufacturers in the planet would be such a person. I guess that high standard as yet to be obtainable.
Exist50@reddit
This has been a major sore point for Intel (SPR shipped on what? E5, F0?), but I'm suspicious he's actually willing to follow through with this threat. Even then, seems more applicable for project management than the rank and file.
jedijackattack1@reddit
Wait they actually got to tape out F? Wtf how do you screw up that bad.
Exist50@reddit
If you include the metal steppings, Sapphire Rapids ended up around 14 or 15, iirc. 12 at minimum. And that was just for the first (real) launch. If they've done more since, I wouldn't know.
Simple. Lay off your server validation team, scope creep to hell and back, have a buggy, PIA process node, and spam steppings as a TTM strategy. And have no real accountability for any of it.
Grizknot@reddit
eli5 what is a stepping? what does this all mean?
droptableadventures@reddit
A stepping is new version of the same chip with minor changes because they shipped with part of the chip not correct (sometimes it's because they've identified ways to make it better, but usually just bug fixes).
Issues with the CPU can usually be worked around in software but there are often performance tradeoffs as a result. So usually this is a bad thing because it means there's a version of the chip already sold to people with a known problem.
Grizknot@reddit
yikes... happy that I've avoided this since I got an amd cpu 6 years ago and have been happy with it so far
cyborgedbacon@reddit
This isn't strictly limited to Intel, AMD and any other CPU manufacture releases revised versions.
Exist50@reddit
That's not what I meant here. Since the pre-Si validation team was so lacking, and pre-Si in general lets you get less cycles in than post-Si, Intel decided that that fastest way to get SPR healthy and to market was to rapidly churn through bug fixes in silicon.
Exist50@reddit
You can think of a processor as consisting of a layer of transistors, topped with multiple layers (about 10-20) of wires. A "stepping" is a revision to the design that changes one of these. The letter (e.g. A, B, C) indicates a change to the transistor layer, while the number indicates a change to the wires, and gets reset on a change to the transistor layer (changes both).
So e.g. B3 stepping would be the 2nd version of the transistor layer, and 4th version of the wires for that transistor layer. Changing the wires (aka metal layer) is easier/faster/cheaper than changing the transistors (aka base layer), but gives you fewer "tools" to fix bugs.
Btw, the other reply misrepresents what I meant by "spamming steppings". Replier to him/her with a better explanation.
Deciheximal144@reddit
Did they ever restore the server validation team?
Exist50@reddit
Keller spent a lot of time building it back up, afaik. DCAI is still a shitshow, but objectively the tape out quality on GNR/SRF is much, much better than it has been for many years.
Granted, last I heard that specific dev team got reassigned to AI stuff, so no idea who's left on server CPU, or what they're working on. Nor do I know what more recent layoffs/attrition have done to that team.
that_dutch_dude@reddit
there is no way that management gets fired before rank and file does.
someusername5873@reddit
Project management aren't the same thing as managers.
https://www.quora.com/What-is-the-difference-between-project-managment-and-managment
that_dutch_dude@reddit
its hillarious people actually believe those people will get fired before the peasants in the field doing the actual work get booted.
someusername5873@reddit
A tech project manager is more like a restaurant expediter: coordinating the flow of work, tracking blockers, and yelling that the sides aren’t ready for the steak. The GM or kitchen manager manages people. The expediter manages the order flow. Same idea: PMs often manage projects, not employees.
that_dutch_dude@reddit
you desperatly try to make the name of the job make a difference. its REALLY simple: targets are not met = the peasants below you get fired. if there is nobody below you: congrats, you are the peasant.
someusername5873@reddit
That's my point and that's /u/Exist50's point. There is no one underneath project management...
that_dutch_dude@reddit
calling something management doesnt make it management. you can call something like "overslight manager for financial transactions at a multi-national corporation" or call it the more common job title: the guy at the drive tru booth at mcdonalds. just like "urban pharmaceutical distribution specialist" and drug dealer are the same job.
someusername5873@reddit
Huh, you're agreeing with me then. That's what I've been saying... that project management is NOT management. Are you able to read?
narwi@reddit
He will. Morale will decrease and errors will increase. After another iteration of Intel downward spiral, they might rethink it.
someusername5873@reddit
Hopefully people don't game it and just reduce the bar for validation...
Exist50@reddit
Well if scaling back scope actually let them ship on time, then that's probably what they should do. If it means customers are the ones who end up finding the bugs, however...
someusername5873@reddit
I’m all for reducing scope. What worries me is cutting corners on validation such that everything looks green, even though the design hasn’t actually been exercised as thoroughly as it needs to be.
mrheosuper@reddit
Something something "When a measure becomes a target, it ceases to be a good measure"
Jonny_H@reddit
This sort of thing "might" work if they can possibly decide blame - but that's functionally impossible for the scale of complexity they're working at.
Even ~10 years ago I worked at a third party IP provider on an Intel project, and it was a mess - every engineer wanted off the product as it was "low status" on the Intel side, there was a constant "blame game" for issues, and it ended up shipping with fundamental features disabled, from my perspective were entirely Intel issues (PDK and hardening issues - which was out of our hands as we only supplied RTL in the first place).
It was common that the "Engineering Contact" we had changed every month - if we were informed at all and weren't just sending emails to a black hole.
And then even to this day the IP provider was dragged through the mud due to incorrect statements - like we didn't provide driver code or similar - when that was very much not the case.
intronert@reddit
Intel has been known for a long time as having a cutthroat corporate culture, and this will just make it worse. More effort will go towards shifting blame to someone else than to getting to the root causes of the bugs. Those who rise up the corporate ladder will be the best at screwing others and lying to bosses. So, business as usual. :)
jking13@reddit
I remember during the .COM boom hearing that at least in some locations, they had people sitting at the entrance to the building and if you walked in at 8:01, you were written up. People running late would just sit in their car for an hour or two, because by that point, they assumed you had been at another location for meetings and such and just let you walk in.
intronert@reddit
I think this was called the “125%” solution. Intel mgmt decreed that everyone would work 10 hour days instead of 8 hour days, from 8am to 6pm.
I heard the same thing about waiting until the guards left.
awshuck@reddit
“Our systemic failures and horrible culture are now your personal problem”.
DehydratedButTired@reddit
Just sounds like more layoffs, disguised as quality control.
justgord@reddit
.... and now every engineer is too scared to innovate.
scytheavatar@reddit
Intel's main problem which led to their downfall has been their overeagerness to innovate. To the point that they have been out of touch with their customers and what products actually make money. Lunar Lake being too good and a pyrrhic victory as it killed their margins is the best example of this. Maybe having the fear to innovate will be a good thing for Intel, it was something that helped TSMC reach its monopoly.
claytonkb@reddit
\^ This is the real truth. Alumni here. It's really just a question of tradeoffs. Sr. management should not push down the consequences of their choices in tradeoffs. If we (engineering) clearly explain the risk/reward ratio to management, we have done our job. The product-planning/market-positioning calls made by Sr. management are risk calls that they make, not the engineering teams. The more "features" and performance requirements you stuff into the next SKU, the more likely it is that one of those features or requirements could result in a stepping-bug. The combined experience of all engineers on the team is insufficient to predict such bugs... precisely because these bugs occur. That is, if it were possible to detect/foresee such bugs (on the given timeline), the masks would never have been shipped to the fab in the first place, the bugs would have been fixed. So, it's an iron-triangle: fast, correct, features -- pick 2. You can have it fast and correct, but drop the features; you can have it correct with all the features, but it's going to take a long time. And if Sr. management demands all the features, and fast, it won't be correct (many steppings)...
W0LFSTEN@reddit
This is a chip design and production company that has had persistent issues designing and producing chips. This is a massive issue for Intel. Holding people accountable for their literal job description is not some outlandish idea. If I cost my company tens of millions, I would fully expect to be laid off. Maybe I’m just old fashioned 🤷
mnmevan@reddit
People in the comments didn’t have to deal with the fallout of the 13th/14th Gen Vmin Shift defect. Intel has made my job hell for the past few years. Enough is enough!
mlloyd@reddit
Oh yeah, that told them! They're definitely going to stop screwing off now!
Seriously, what's going to happen is all the folks who were reinvigorated under the leadership of an actual engineer who invested in product to do it right will go to places that value them while the B and C teams stick around and cut features until B0 or die is possible.
So what that really means isn't B0 or die. It's B0 and Intel dies as they become Cirrus Logic due to an inability to be ambitious enough to keep up with the market leaders.
SubmarineWipers@reddit
An argument can probably be made, that you _can_ improve the process, validation, testing and mindset of peple, even without creating an atmosphere of a gulag in your company.
Exist50@reddit
RPL was a success, from this perspective. It shipped on B0.
Exist50@reddit
I don't think anyone is disputing that quality matters and that there needs to be accountability for poor work, but for a bug to reach B0, it's already had to pass through several teams, much less individuals. Which one gets the blame? If not none, then all? Well you can't fire everyone, so then what? There's no good answer here, and it's a distraction from the main issue of how to improve quality from a development process perspective.
And if quality is a consistent issue across the entire company, that's clearly no longer an IC problem, but rather a management one. And people are justifiably worried that management failings result in IC fallout.
UpsetKoalaBear@reddit
This is the issue I see.
There is little to no way to hold people accountable for things like this (unless they drop the wafer or break the machine lol).
There’s several teams involved before B0, you would need to find a scapegoat. It’s just hostile all around.
ncook06@reddit
Yeah, I don’t know why people are acting like this is toxic work environment stuff. My last big tech employer had such a positive culture that the core product stagnated and the lack of growth led to mass layoffs. That is a much more toxic work environment.
namotous@reddit
Lolll you want them to do better? You gotta throw them incentive.
intronert@reddit
Perhaps the loss of a finger for each bug then?
Katent1@reddit
I mean it will hold on managment too, right? Right?! xP
ashvy@reddit
Yeup, thems gonna get perf bonus for finding and addressing the "problem"
Polar_Banny@reddit
Hold on, you are too fast with assumptions.
that_dutch_dude@reddit
it should be clarified: Tan is a "shareholders CEO". he is there to make sure the line goes up with no regard of the bodies it leaves in its wake.
intel will not survive him. but the sharholders will cash out before that so they dont care.
W0LFSTEN@reddit
They are down almost 50k employees since 2022 and yet they are pumping out some of their more competent products in years. The company is having difficulty making a profit and yet you insist that for the company to survive they need to care less about the line going up and more about harboring employees who have had no meaningful positive impact on the company’s trajectory.
elkond@reddit
they are pumping out products designed half a decade ago, u seriously underestimate how long hardware development cycle is
W0LFSTEN@reddit
No, I am not underestimating hardware development cycles. And the majority of the cuts aren’t even from design teams.
Exist50@reddit
Because of or in spite of him?
Even with all their quality problems, Intel Products is still very profitable. It's the fab that's dragging down the company as a whole, and this proclamation doesn't affect either way.
that_dutch_dude@reddit
Tan is riding the results of gelsinger. the current "high" is not his doing, its Pat's.
Vigilant256@reddit
Not fully, pat did the right thing on the foundry, but didn’t know how to execute . The people that he chose to manage weren’t very good and I’m surprise that he put a head of people ( from hr) to lead data centre. Then he assumes pc demand will go on forever. He wasn’t very careful with finance either , despite knowing that the foundry will require lots of cash , he didn’t control spending , and instead hire more and more VPs.
Exist50@reddit
Pat, I'd say 50/50. Objectively, his foundry push was a disaster, and not having an AI story is also on him. I think a lot of what Pat said was sensible, but what he did in practice (particularly towards the end), and especially the management he put in charge were not conducive to his goals.
They did see results. Pat gave them a timeline for 18A and the promise of customers. He failed to deliver.
W0LFSTEN@reddit
Tan’s levers in the short term are more operational and financial than engineering. My point was regarding employee count.
Intel Products is doing quite good. The company is finally seeing some meaningful growth due to AI.
Deciheximal144@reddit
I interpreted the user's comment as how company should care about the line going up more in the long term than the short term. That people are making money now to let the whole thing crash out later.
W0LFSTEN@reddit
Caring too much about the long term would have been the quickest way to bankrupt Intel.
The company *needed* to focus on the short term because they weren’t making any money, were losing share, taking on debt and it was in doubt whether they even had a future…
Genuinely, wanting the company to make even more long term investments *than they were already making* would have been one of the most short term solutions the company could have made. It would have burdened the company beyond its means.
that_dutch_dude@reddit
thats not what i said. but thanks for your input.
W0LFSTEN@reddit
>[Tan] is there to make sure line goes up with no regard of the bodies it leaves in its wake. Intel will not survive him
By “line goes up” did you not mean “cares about financial metrics?
By “bodies it leaves” did you not mean “fired employees”?
By “will not survive him” did you mean he will bankrupt the company?
I would love to hear your insight.
65726973616769747461@reddit
And is that somehow worse than a CEO who just coasts on a golden parachute, perfectly content to watch the company burn, regardless of whether it tanks today or tomorrow?
LuluButterFive@reddit
So he is like every CEO?
Fr0stCy@reddit
A0 to production has been the dream, but it means accepting an extensive burden of Formal Verification. You either need the manpower to work alongside the design team running simulations and Boolean equivalence constantly, or commit to a 4-8 week design freeze at the end where all of that is done.
I’ve been a part of A0 to prod, but it only happens when management properly commits resources to make it happen. If you’re on C0/D0, either the requirements keep changing under you or the management structure doesn’t work.
gomurifle@reddit
He is a finance guy right? Doesn't sound like an engineering background.
bubblesort33@reddit
And then he'll replace you with AI powered by the chip you were designing.
Xanthyria@reddit
Does no one remember the G0 Q6600? That thing was a beast!
Also absurd how far it went on the stepping
MediocreAd8440@reddit
The accidental CEO is at it again
Akeshi@reddit
Ordinarily I'd acknowledge how crazy this is, but I'm still having to put up with a broken 13900k. $4k PC and it can't reliably run even a single command line application.
FastHotEmu@reddit
Ah yes, Lip-Bu Tan, the former CEO of Cadence whose company pleaded guilty to conspiracy to commit export control violations and had to pay $140 million in fines all under his watch.
tverbeure@reddit
Over the years, leather jacket man has done plenty of all-hands employee meetings where he said just the opposite: if focus too much on accountability, people become too careful, won't take risks, and eventually a competitor will eat your lunch. Which is exactly what happened to Intel.
xXBongSlut420Xx@reddit
fucking ridiculous. This will cause more issues than it solves. People will be afraid to report issues. Any failure should rightly be considered a process failure, not a personal failure. Even if someone does something really stupid, there should have been systems in place to prevent that. Making people afraid to make mistakes does not in fact reduce mistakes.
Nippius@reddit
They looked at Boeing and learned all the wrong lessons...
imaginary_num6er@reddit
So people should already start looking for jobs at AMD if they know a fix is needed, but it hasn’t been brought up yet.
W0LFSTEN@reddit
Why aren’t fixes being brought up? That unironically does seem like a good reason to see your employment terminated.
Exist50@reddit
I think they're taking it too literally, and for the wrong people, but if you assume that bringing up that you introduced a bug gets you fired, then it would make more sense to try to hide it or distance yourself instead. But again, I think that's neither the spirit nor reality of this proclamation.
elkond@reddit
no, bugs sit in jira for years
RealThanny@reddit
I think the claim is that with this incentive structure in place, if you find an error after B0, you'll be fired if you report it. So errors will be hidden, perhaps until a new job is found. Or perhaps forever, until someone else finds it.
It's a real problem with this scenario, at least as described by the headline. If you'll be fired for letting errors get past B0 stepping, then you have zero incentive to point out errors that make it past that stepping. The only way to avoid that, which they may be doing, is having multiple isolated teams checking the same subsystems at different times, so that errors will be found by someone who won't be fired for it.
ListenBeforeSpeaking@reddit
Once they fire the person, I would think they likely fired the person most qualified to fix the problem.
Exist50@reddit
That's a bit of "firefighting by arsonists" though.
ListenBeforeSpeaking@reddit
They didn’t set the fire on purpose.
The problem with complex bugs is that the design is likely complex. They probably spent 12-18 months working on whatever their blocks is.
I wouldn’t want to be dropped into someone else’s complex problem, told that a solution is needed immediately, and know that if I miss something I’m going to get fired.
W0LFSTEN@reddit
I would question whether the people pointing out the errors are even the ones making them. Not sure how Intel operates, but it is not uncommon for these to be completely different teams.
adamrch@reddit
probably not the same people. But you gotta be a "team player" and not "rock the boat" so you get yourself and others fired for pointing out major issues.
elkond@reddit
Oh I assure you there's no correlation between how well u do ur job at Intel and being terminated xD
neuronez@reddit
Frankly if you need three base layer spins to get your chip right you belong in the software industry, not semiconductors
EmilMR@reddit
with that kind of stress they better pay their verification engineers a proportional salary.
at least he is not talking about AI doing the job.
RailFan65@reddit
Something tells me most people in the comments haven't had a job.
Enemiend@reddit
Let's see if this ends up leading in noticable activity in the semi-academic hardware verification community a few years down the line. Particularly the hardware verification competition and the like. Though unfortunately a lot of industry developed tools don't (or cannot) participate. But would be cool to see some advancements here.
Geddagod@reddit
A lot of the things LBT claims, are eerily similar to what Pat Gelsinger also said.
He also constantly talked about how they need to reduce the number of steppings, and increase employee accountability, so tbd if things actually improve, and LBT's threats here actually result in a meaningful improvement.
CallMePyro@reddit
If anyone makes a mistake, kill them. Public execution. That will make sure everyone is careful and focused!
LuluButterFive@reddit
based on these comments im wondering how many people here actually work a day in their lives
SmashStrider@reddit
This move seems to be fostering a culture of fear rather one of collaboration and excellence, something that would just inadvertently encourage a toxic environment or even outright lying to supposedly meet targets...
Specific-Path3179@reddit
It's a sound strategy tbh
ScienceMechEng_Lover@reddit
White people are gonna get a taste of Asian parenting and realise why they're so successful 🔥💪.
Deciheximal144@reddit
That's right, crack that whip!
Hey, why are all the horses leaving?
whyte_ryce@reddit
One Intel project started at something like X0 or Z0 because a major feature wasn't going to work but they wanted something out the door asap to test the other stuff. So the company already has a hack for this mandate
Vaguswarrior@reddit
Ahh toxic work standards. That will help.
AutoModerator@reddit
Hello CopperSharkk! Please double check that this submission is original reporting and is not an unverified rumor or repost that does not rise to the standards of /r/hardware. If this link is reporting on the work of another site/source or is an unverified rumor, please delete this submission. If this warning is in error, please report this comment and we will remove it.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.