Intel CEO Lip-Bu Tan stamps out chip bugs with aggressive new quality standards, says major validation errors can result in termination — 'B0, you keep your job. Anything above that, you are fired'

[-]

gumol@reddit

Ah yes, that'll surely create a healthy work environment. There's a reason blameless postmortems were invented.

[-]

After the RPL disaster that turned out to be a PR nightmare for Intel and resulted in Gelsinger's 'early retirement,' I can't fully blame a new CEO for creating a toxic environment to keep the production quality in check.

[-]

imaginary_num6er@reddit

Gelsinger was kicked out because of his charity GPU product line

[-]

Exist50@reddit

Nah, it's the IFS side that got him canned. If anything, investors were mad he threw all their money into the foundry black hole and missed the AI train in the process.

[-]

laffer1@reddit

Intel didn’t miss the ai train, they just sucked at it. They had ai accelerators and the miss happened before pat was there

[-]

Exist50@reddit

Intel didn’t miss the ai train, they just sucked at it.

Part of the same. The point was Pat was punished for not having a real product for AI. He even better against the boom we now see with agentic AI.

[-]

laffer1@reddit

Intel had gaudi before pat joined again. They had a product.

[-]

Exist50@reddit

People weren't actually buying it.

[-]

Grizknot@reddit

so many contradictory comments here, He was fired bec he created a GPU division, no he was fired bec he didn't create a GPU division, he was fired for making bad CPUs no he was fired for not making enough bad CPUs

[-]

Geddagod@reddit

He was fired bec he created a GPU division, no he was fired bec he didn't create a GPU division

It did look like he created a GPU division.

Problem is that they couldn't actually ship the stuff from that division. Delays and issues. So in a sense, it's a double whammy. It might genuinely have been better if he didn't create the division at all and waste resources doing so, than to have done so and failed.

he was fired for making bad CPUs no he was fired for not making enough bad CPUs

Yes, both of those are true, and aren't necessarily contradictory.

Even with bad products, Intel has typically flooded the market to keep unit share, like we saw with DCAI in the past. Intel still ships tons of 'bad' products.

The problem is that Intel wasn't doing that with Intel 3 stuff, since Pat assumed they would have a bunch of 18A volume and products out by now, or very soon.

So really Pat messed up here in 2 ways, one not predicting the AI CPU boom (to be fair, did anyone else really do that either?).

And two, planning node capacity in such a way that they aren't able to supply their worse CPUs in a supply shortage, at least without more lag time than necessary.

Reducing Intel 7 capacity though is definitely more excusable though, it's really how they planned for Intel 3 and 18A that should be criticized.

[-]

Vigilant256@reddit

I can understand you didn’t see the ai boom , but why couldn’t you pivot it fast enough. I doubt AMD forsaw the ai boom as well but they quickly pivot towards it and have a product as well.

[-]

Exist50@reddit

I mean, it seems simple. The two main reasons he was fired were overspending and underdelivering on Foundry, and failing to have a meaningful product offering for the AI boom.

[-]

Exist50@reddit

RPL is the least of their problems. Looks at SPR. Look like 15 steppings to get out the door, which delayed it 1-2 years. That's what they want to avoid.

Granted, that was largely because BK had the bright idea of laying off Intel's server pre-silicon validation team. After all, quality was so good, who needs them!

[-]

Demian52@reddit

Funny thing about that, I was on DMRs validation team and got laid off after training on its architecture for a year. In fact, a ton of the validation staff on that project were cut. History repeats itself I suppose.

[-]

Seanspeed@reddit

After the RPL disaster that turned out to be a PR nightmare for Intel and resulted in Gelsinger's 'early retirement,'

Gelsinger wasn't sacked cuz of some 13900k's having issues. lol

[-]

R-ten-K@reddit

No. Gelsinger was sacked because he had shown a pattern of consistently missing major trends and the board, and most large investors, lost confidence in his tenure (and rightfully so).

[-]

thegammaray@reddit

He was sacked cuz he spent tons and tons of money towards fabrication advancements but hadn't yet shown financial rewards from doing so.

Is that really why? To me it always seemed more likely that the Falcon Shores failure/cancellation got him canned.

[-]

Geddagod@reddit

but hadn't yet shown financial rewards from doing so.

I mean Gelsinger warned everyone that it was a long term project. The problem wasn't the lack of financial rewards yet, it was the lack of... any sort of "rewards" yet. No customers, no process leadership, and delays.

[-]

spicesucker@reddit

At the same time “No Blame” culture has largely been replaced by “Just Blame”, they’re similar but if you should have known better you’ll still get it in the neck

[-]

Panaka@reddit

“No Blame” culture has always had a carve out for negligence. This is just MBAs rebranding a decades old concept as something new.

[-]

Free-Competition-241@reddit

You hear accountability and call it blame.

[-]

nittanyofthings@reddit

But "just blame" is the ideal right? Tan's problem is he is using the end result as the determinant. You can do the right thing and still have it not work out.

[-]

Exist50@reddit

But "just blame" is the ideal right?

How?

[-]

Far_Piano4176@reddit

i think they interpreted "just blame" as "being just in the assignment of blame" rather than "only blame", but the latter is what spicesucker meant.

[-]

Free-Competition-241@reddit

Making computer chips isn’t about cuddly wuddly post mortem meetings.

“Hey Bob, we don’t want to point fingers, but it turns out the space shuttle blew up due to a bug you introduced.”

[-]

whyte_ryce@reddit

It’s not completely wrong. Intel engineers got too used to having a captive fab and having projects with like 8 steppings. Not enough reliance on quality pre si to stamp out bugs, in the distant past at least. Lots of PM mishandling and politics over not dropping features that weren’t ready to go on A0 and would be enabled for the first time on B0. Apparently one of the things Keller tried to stress was that this isn’t how the rest of the industry worked and they needed to shift to a “if it’s not ready it doesn’t make the cut” mindset

Lip seems to have just taken that to an extreme

[-]

Exist50@reddit

It's a complicated problem, but Intel's historically understaffed in pre-Si validation. I mentioned in another comment that Intel's server troubles can be traced in large part to Intel largely laying off the server pre-Si val team under BK, but even on the client side, their staffing is more like 1:2 dev:val vs something like 2:1 at the likes of Apple.

Also, the fab might have enabled more steppings for "cheap", but they also drive additional ones themselves. No other fab would introduce so many late, design breaking changes and expect the design team to keep up.

[-]

whyte_ryce@reddit

Intel brought in a lot of people that actively disliked validation and thought validation spent a lot of time doing science experiments and finding useless bugs no customer would actually see.

[-]

max123246@reddit

Validation work almost always gets paid less because of archaic ideas of who the "value producers" vs "cost centers" are at a company. Complete bollocks but you see people denigrate support engineers and infra to this day

[-]

Exist50@reddit

IVE? Heard they often got the short end of the stick.

[-]

Remarkable-Deer-6721@reddit

You are kissing the hand of your slaver after he crushed your theets. Disgusting

[-]

hollow_bridge@reddit

politics over not dropping features that weren’t ready to go

expect more of this, the new policy will significantly increase slowdowns.

[-]

Capital-Froyo-4359@reddit

Not sure why you're acting like getting fired for doing a bad job is some crazy idea. Where have you ever worked that people never face responsibility?

[-]

Exist50@reddit

The question is who gets the blame. Design for writing the bug? DV for not finding it? Arch for feature complexity? If one's job is at stake, a lot of effort will be spent assigning blame.

[-]

SlamedCards@reddit

Lip-Bu addressed in during the interview that he only wants to hear bad news. Share problems so it's everybody's problem. It's when you hide things that you get canned

[-]

Exist50@reddit

That's good. And I do think he doesn't mean this statement as something ICs should worry about. Project managers, maybe, but then that's as reasonable a point as any to assign blame.

[-]

capacity04@reddit

Eh, not sure this is a good take. There's a balance. If you want to run an industry leading semiconductor company you need to have high standards

[-]

elkond@reddit

Lip Bu fired people on the IC levels and hired VPs. So even if u try to give it a generous read, his hiring practice makes it impossible to make it a reality

[-]

Vigilant256@reddit

This statement is not fully accurate, lip bu removed quite a number of VPs and very high level ICs . These high level ICs do not do much engineering and more on powerpoint slides .

[-]

elkond@reddit

dude comes in. says "im flattening the management structure". proceeds to fire ICs, lowest level managers (who were effectively ICs too), ye some VPs were let go too, but number of those hired outweighed the fired ones (and i saw far more saying "ye i took position at X, rather them getting the boot)

those high level ICs now are core people at companies like Cerebras. this is just an old man doing old, "strong man" management style disaster

[-]

Vigilant256@reddit

No the VPs who were let go were definitely more than hired. Let go here means those that say “retired” and those that knew they are going to get the axe and quit to go another place.

Intel VPs were bloated , in 2024 they had like 120 VPs . In comparison amd only has 20+ VPs . Apple had like 40+ VPs . For some reason pat thinks putting more management can solve the problems.

[-]

W0LFSTEN@reddit

How many VPs? In which areas? What was his rationale?

[-]

elkond@reddit

ccg software engineering, more than 10, fuck if i know

[-]

hollow_bridge@reddit

Timing is completely wrong, if they did this last year fine; with the current bubble profits this is just going to make employees want to leave.

[-]

mattybrad@reddit

This is how a lot of high paying jobs are. They pay you a ton because you need to do hard things. If you can’t do the hard things, then what are they paying you for?

[-]

doscomputer@reddit

seems like everyone forgot about the defective 13900k/14900ks

[-]

DeuzExMachina_@reddit

The same type of environment that got Intel into its current state under Krzanich

[-]

gburdell@reddit

Apparently you never heard of Andrew Grove

[-]

Immediate_Fig_9405@reddit

I would sell all stocks

[-]

Wisniaksiadz@reddit

I dream one day whole world will just start using poka-yoka and make systems where human error don't break shit up

[-]

lukfi89@reddit

Beatings will continue until morale improves.

[-]

mennydrives@reddit

Honestly, they deserved this. Intel's designers have basically forced the fab team to fix their fuck-ups for years.

Now they have to play catch-up to AMD 'n Nvidia.

[-]

W0LFSTEN@reddit

Morale is quite high at Intel... If you have RSUs. Not sure how the plebs are feeling though.

[-]

wrhollin@reddit

Well, the techs would feel better if you didn't call them plebs.

[-]

W0LFSTEN@reddit

The plebs will grow to understand their place

[-]

INITMalcanis@reddit

"Don't report anything above a B0 issue, got it"

[-]

PilgrimInGrey@reddit

Expert chip designers in the comments. Industry standard is an A0 tapeout. It means highest quality and wide pre-silicon validation coverage. SPR went until C0. Everything after MTL have been B0 steppings. LBT is enforcing what industry follows within Intel. Of course whiners here have no idea how the industry works.

[-]

R-ten-K@reddit

FWIW “A0 tapeout” is common industry shorthand, but not an "industry standard."

Most major vendor have their own proprietary validation, bringup, and FA flows, etc. along with their own internal terminology, release criteria, and stepping definitions. A lot of that development and debug process is kept very close to the chest.

There are some common conventions, however, an NVDA stepping label, validation milestone, or debug flow should not be assumed to map cleanly to Intel’s or AMD’s. The same terms can carry very different implications depending on the company, product line, and internal process.

[-]

ACuriousIdiotDev@reddit

Thanks Claude

[-]

PilgrimInGrey@reddit

Thanks for trying to explain my work to me

[-]

R-ten-K@reddit

you're welcome.

[-]

PilgrimInGrey@reddit

You did a really bad job anyway

[-]

R-ten-K@reddit

K.

LOL

[-]

gburdell@reddit

I've been involved in a shitload of tape-outs at a few different companies and the best I saw was B-1 (a couple of test chips -> A0 -> B0 -> B1)

[-]

R-ten-K@reddit

Your experience track. Studies show 80-90% of ASIC/SoC programs still require at least some level of respin, errata closure, package/process tuning, etc after bringup.

First silicon success is rare for any design of moderate complexity, regardless of team/organization. I have no idea what some commenters were smoking.

Even on some of the most aggressive execution-focused teams I’ve worked with, nobody realistically assumed A0 would pass. There’s a reason there are so many packed daily flights between Taipei and the Bay Area during bringup season lol.

[-]

ElementII5@reddit

So the question arises is doing process validation through production silicon smart or lazy?

I guess TSMC using test chips is unavoidable as they are a pure foundry. Intel doing it through production silicon on one hand dials in the process to their product but also constrains the overall broadband applicability for external customers?

[-]

hwgod@reddit

SPR went until C0

SPR went well beyond C0.

Everything after MTL have been B0 steppings

Absolutely not B0. B-step, maybe. If you ignore the interleaving steppings for -U vs -H or the steppings for ARL-20A.

[-]

PilgrimInGrey@reddit

I was on SPR

[-]

Geddagod@reddit

Seeing how SPR went, makes sense.

[-]

CopperSharkk@reddit (OP)

ARL LNL and PTL shipped at B0

[-]

hwgod@reddit

B-step, yes, but I don't think B0. Intel doesn't also publicize every stepping they make (they use steppings as more of a public versioning system), but sometimes it shows up as a suffix.

[-]

Fr0stCy@reddit

You’re right. A0 to production happens when management properly commits resources for Formal Verification.

Also important to define what is an acceptable risk and what isn’t, because you can’t catch everything in FV.

[-]

jedijackattack1@reddit

A0 is a hopeful standard. But any high risk product or product with a large number of new features b0 tends to be the expected with a0 being a massive win if its possible to do it. I do love my a0 firmware workarounds.

[-]

darknecross@reddit

Agree, sometimes we weren’t even feature complete until B0…

[-]

onnie81@reddit

Haswell was stepping 2 H4, that transactional memory bug didn’t live well

[-]

OttawaDog@reddit

Next he will fire anyone that takes extra time to get a chip project done.

Fired if you don't take extra time to find bugs, fired if it takes you extra time to find bugs.

[-]

Raigarak@reddit

Yeah, not like TSMC makes their employees be on call 24/7 and probably have an even stricter policy.

[-]

KellyShepardRepublic@reddit

They stick to making the chips though, not the whole stack. That allows them to not care for their employees as long as management is competent in the role.

[-]

iSGAFF@reddit

Read this as “Incel CEO”. Was very disappointed. Twice.

[-]

Capital-Froyo-4359@reddit

Diamond Rapids delays gotta be costing them Billions. I can certainly understand the concern.

[-]

Exist50@reddit

Well that isn't quite the same thing. If you insist on few steppings, a natural consequence is taking longer to ensure quality on the first stepping. Delays are fundamentally about mismatches between expectation and execution reality. Whether that manifests as more steppings, pushing A0/B0, more bugs, or all of the above, the root problem is that mismatch.

[-]

Vigilant256@reddit

Then they’ll compare your teams capability with the competition. If the competition can do it with less headcount, why can’t you?

[-]

Interesting-Rock2474@reddit

Should you find out that the competition is more capable it does not chance your capability. Should you want to lower the number of steppings your validation needs to be better.

You can not simply say look at the competition they can do it in x (time/stepping's) so we should also be capable of that. Improvement comes when you examen your processes and try to improve them.

Intel has had multiple problems for silicon validation for example(non exhaustive), lots of costum silicon not synthesised, general lots of design creep/changes, program for chip design validation worse than industry standard

[-]

Vigilant256@reddit

Then fix it. If you have more VPs, more fellows, more management then why can’t you fix it . And why are your processes behind your competition despite being established 10-20 years earlier than your competition.

[-]

Minced-Juice@reddit

The "delay" is referring to 384 core and 512 core variants which weren't known to exist till the rumors of this delay started appearing.

Before this there was still confusion on whether DMR would be 192-core or 256-core.

[-]

Geddagod@reddit

Depends on what the root cause of DMR delays are. Maybe Intel throws the packaging team under the bus publicly like they did for CWF.

[-]

callmedaddyshark@reddit

If your kids are never allowed to make mistakes, you don't raise perfect kids, you raise liars

[-]

ResponsibleJudge3172@reddit

Ap, A1 and so on are mistakes. B0 major revisions

Anything after? Well Who knows. But you completely kill cadence with that. Find and sort any major bug in the B0 is likely the target. Don't do piecemiel prep launch steppings

Pretty sure someone pointed out was it either H100 or A100 were A0 for comparison.

[-]

Exist50@reddit

B-step is more the standard for complex SoCs, but Nvidia does often ship on A-step for their GPUs. Not always, but often. GPUs tend to be more amenable to sw/firmware workarounds, however.

[-]

max123246@reddit

A good litmus test would be to checkout what step Nvidia shipped Grace-Hopper since that was a CPU

[-]

Exist50@reddit

Well, it's 2 separate dies. So each one would have its own stepping. Not sure if anyone's reported on the stepping for Grace though.

[-]

max123246@reddit

Yeah, I was more saying whether the CPU Nvidia developed was done faster than Intel CPUs, to rule out the GPU chip being maybe easier to hide defects

[-]

ResponsibleJudge3172@reddit

To be fair, the mediatek CPU-GPU partnership (who I guess Mediatek is responsible for CPU side) is likely past B stepping and may be at C or more from the delays and rumors of showstopping bugs

[-]

Jonny_H@reddit

I feel that's more based on the accuracy of their simulation and emulation, and ability to "absorb" and workaround bugs, than anything else.

There's a lot of things that can be "hidden" by a closed graphics driver and shader compiler, after all. Then the only reason to re-spin is if the cost really outweighs the benefits of that particular feature or instruction.

[-]

max123246@reddit

Hopper was a pretty big mistake needing software emulation to regain accuracy lost by the hardware implementation for example

[-]

Jonny_H@reddit

I really mean emulation of the hardware design before it's actually put into silicon rather than a "possibly incorrect design focus decision" like that.

Possibly simply because I was somewhat separated from that level of decision making - by the time I saw things that was years ago :p

[-]

Exist50@reddit

Yeah, beyond just relative complexity, CPUs are kind of screwed in this regard because, unless you're Apple, the software contract is at the ISA level. You can use microcode, but it's not as powerful a tool as a full on driver stack.

[-]

max123246@reddit

Also Nvidia's ISA, PTX is not the actual assembly, it compiles to SASS which is a per GPU architecture assembly

[-]

ResponsibleJudge3172@reddit

Nvidia also apparently just left 2 TPC non functioning in their AD103 leaving it at 80SM compared to not only rumors, but even the files that were hacked saying 84 SM for AD103 just to prevent delays.

Maybe Lip Bu Tan would be the same. For example, Intel 12th Gen couldn't get AVX512 working. Instead of workarounds that delay things. Disable it from the start and cut off the feature early. Do the final revision by the B0 spin.

[-]

RailFan65@reddit

Thankfully the engineers working at Intel are not kids and can understand adult things like high stakes and doing your job correctly to get paid.

[-]

W0LFSTEN@reddit

I think Intel is less concerned with raising perfect children and more concerned with having highly technical and competent operations.

By the way, how big of a mistake would an E5 stepping be in children world?

[-]

callmedaddyshark@reddit

Consider this scenario: John is human, got unlucky, and introduced a huge bug. It is in Intel's best interests for remediation to start immediately, to limit the damage. However, John knows that when someone finds it was his mistake, he'll lose his job. Here's John's plan:

Hide the bug as long as possible. Obfuscate documentation. Alter tests to pass regardless. Let it get all the way to production. Downplay and dismiss bug reports even after it's on the shelves. He'll get a few more months salary that way.
Half-ass his job the remaining time he's there. Be less productive and don't care about introducing more bugs. He's spending most of his time on the cover-up anyway.
become a farmer. Forget all his specialized knowledge and expertise in semiconductor design. Let all his institutional knowledge die with him instead of passing it to the next generation. Never be the hero who catches the same bug early next time, because he no longer works at Intel.

[-]

intronert@reddit

Nailed it!

[-]

azn_dude1@reddit

That doesn't explain how someone "getting unlucky" and introducing a huge bug results in an E5 stepping

[-]

W0LFSTEN@reddit

He found a bug he introduced after tapeout? It doesn’t really matter what he does after that point, it’s not his job to validate the chip. Out of his hands completely. He can come clean, or he can just wait for someone else to find his colossal fuckup.

Let’s say you’re a doctor. You accidentally leave your medical scissors inside someone’s chest cavity during open heart surgery. Dr John knows that when someone finds god mistake, he’ll lose his job. Here are Dr John’s plans:

Hide the scissors as long as possible
Half ass his job, spend most his time arguing in medical court that it was actually one of many nurses, interns and assistants that did it
Become a farmer

[-]

SmokingPuffin@reddit

By the way, how big of a mistake would an E5 stepping be in children world?

It's roughly a baseball team full of mistakes. You can't get this level of failure in any one mistake.

[-]

b3081a@reddit

B0 is the chance to fix all your mistakes after numerous A* steppings. AMD has been shipping CPU/SoC chips including major changes with B0, and minor changes with A1 or even A0 for years, and there's nothing wrong about this approach.

[-]

Vushivushi@reddit

B0 is plenty of tolerance for mistakes?

Rather than encouraging liars, it encourages teams to be honest and realistic with their goals on the get-go.

[-]

Arch-by-the-way@reddit

Redditors ask for higher standards then complain when they learn what higher standards look like

[-]

jaxspider@reddit

No one in the tech industry hates high standards. There is a way to communicate your message without sounding like a dickhead. You'd think the CEO of one of the two biggest CPU manufacturers in the planet would be such a person. I guess that high standard as yet to be obtainable.

[-]

Exist50@reddit

This has been a major sore point for Intel (SPR shipped on what? E5, F0?), but I'm suspicious he's actually willing to follow through with this threat. Even then, seems more applicable for project management than the rank and file.

[-]

jedijackattack1@reddit

Wait they actually got to tape out F? Wtf how do you screw up that bad.

[-]

Exist50@reddit

If you include the metal steppings, Sapphire Rapids ended up around 14 or 15, iirc. 12 at minimum. And that was just for the first (real) launch. If they've done more since, I wouldn't know.

Wtf how do you screw up that bad.

Simple. Lay off your server validation team, scope creep to hell and back, have a buggy, PIA process node, and spam steppings as a TTM strategy. And have no real accountability for any of it.

[-]

Grizknot@reddit

eli5 what is a stepping? what does this all mean?

[-]

droptableadventures@reddit

A stepping is new version of the same chip with minor changes because they shipped with part of the chip not correct (sometimes it's because they've identified ways to make it better, but usually just bug fixes).

Issues with the CPU can usually be worked around in software but there are often performance tradeoffs as a result. So usually this is a bad thing because it means there's a version of the chip already sold to people with a known problem.

[-]

Grizknot@reddit

yikes... happy that I've avoided this since I got an amd cpu 6 years ago and have been happy with it so far

[-]

cyborgedbacon@reddit

This isn't strictly limited to Intel, AMD and any other CPU manufacture releases revised versions.

[-]

Exist50@reddit

"Spam steppings as a TTM strategy" means they released it without properly checking their work, because they wanted to be selling it and making money sooner.

That's not what I meant here. Since the pre-Si validation team was so lacking, and pre-Si in general lets you get less cycles in than post-Si, Intel decided that that fastest way to get SPR healthy and to market was to rapidly churn through bug fixes in silicon.

[-]

Exist50@reddit

You can think of a processor as consisting of a layer of transistors, topped with multiple layers (about 10-20) of wires. A "stepping" is a revision to the design that changes one of these. The letter (e.g. A, B, C) indicates a change to the transistor layer, while the number indicates a change to the wires, and gets reset on a change to the transistor layer (changes both).

So e.g. B3 stepping would be the 2nd version of the transistor layer, and 4th version of the wires for that transistor layer. Changing the wires (aka metal layer) is easier/faster/cheaper than changing the transistors (aka base layer), but gives you fewer "tools" to fix bugs.

Btw, the other reply misrepresents what I meant by "spamming steppings". Replier to him/her with a better explanation.

[-]

Deciheximal144@reddit

Did they ever restore the server validation team?

[-]

Exist50@reddit

Keller spent a lot of time building it back up, afaik. DCAI is still a shitshow, but objectively the tape out quality on GNR/SRF is much, much better than it has been for many years.

Granted, last I heard that specific dev team got reassigned to AI stuff, so no idea who's left on server CPU, or what they're working on. Nor do I know what more recent layoffs/attrition have done to that team.

[-]

that_dutch_dude@reddit

there is no way that management gets fired before rank and file does.

[-]

someusername5873@reddit

Project management aren't the same thing as managers.

https://www.quora.com/What-is-the-difference-between-project-managment-and-managment

[-]

that_dutch_dude@reddit

its hillarious people actually believe those people will get fired before the peasants in the field doing the actual work get booted.

[-]

someusername5873@reddit

A tech project manager is more like a restaurant expediter: coordinating the flow of work, tracking blockers, and yelling that the sides aren’t ready for the steak. The GM or kitchen manager manages people. The expediter manages the order flow. Same idea: PMs often manage projects, not employees.

[-]

that_dutch_dude@reddit

you desperatly try to make the name of the job make a difference. its REALLY simple: targets are not met = the peasants below you get fired. if there is nobody below you: congrats, you are the peasant.

[-]

someusername5873@reddit

That's my point and that's /u/Exist50's point. There is no one underneath project management...

[-]

that_dutch_dude@reddit

calling something management doesnt make it management. you can call something like "overslight manager for financial transactions at a multi-national corporation" or call it the more common job title: the guy at the drive tru booth at mcdonalds. just like "urban pharmaceutical distribution specialist" and drug dealer are the same job.

[-]

someusername5873@reddit

Huh, you're agreeing with me then. That's what I've been saying... that project management is NOT management. Are you able to read?

[-]

narwi@reddit

He will. Morale will decrease and errors will increase. After another iteration of Intel downward spiral, they might rethink it.

[-]

someusername5873@reddit

Hopefully people don't game it and just reduce the bar for validation...

[-]

Exist50@reddit

Well if scaling back scope actually let them ship on time, then that's probably what they should do. If it means customers are the ones who end up finding the bugs, however...

[-]

someusername5873@reddit

I’m all for reducing scope. What worries me is cutting corners on validation such that everything looks green, even though the design hasn’t actually been exercised as thoroughly as it needs to be.

[-]

mrheosuper@reddit

Something something "When a measure becomes a target, it ceases to be a good measure"

[-]

Jonny_H@reddit

This sort of thing "might" work if they can possibly decide blame - but that's functionally impossible for the scale of complexity they're working at.

Even ~10 years ago I worked at a third party IP provider on an Intel project, and it was a mess - every engineer wanted off the product as it was "low status" on the Intel side, there was a constant "blame game" for issues, and it ended up shipping with fundamental features disabled, from my perspective were entirely Intel issues (PDK and hardening issues - which was out of our hands as we only supplied RTL in the first place).

It was common that the "Engineering Contact" we had changed every month - if we were informed at all and weren't just sending emails to a black hole.

And then even to this day the IP provider was dragged through the mud due to incorrect statements - like we didn't provide driver code or similar - when that was very much not the case.

[-]

intronert@reddit

Intel has been known for a long time as having a cutthroat corporate culture, and this will just make it worse. More effort will go towards shifting blame to someone else than to getting to the root causes of the bugs. Those who rise up the corporate ladder will be the best at screwing others and lying to bosses. So, business as usual. :)

[-]

jking13@reddit

I remember during the .COM boom hearing that at least in some locations, they had people sitting at the entrance to the building and if you walked in at 8:01, you were written up. People running late would just sit in their car for an hour or two, because by that point, they assumed you had been at another location for meetings and such and just let you walk in.

[-]

intronert@reddit

I think this was called the “125%” solution. Intel mgmt decreed that everyone would work 10 hour days instead of 8 hour days, from 8am to 6pm.

I heard the same thing about waiting until the guards left.

[-]

awshuck@reddit

“Our systemic failures and horrible culture are now your personal problem”.

[-]

DehydratedButTired@reddit

Just sounds like more layoffs, disguised as quality control.

[-]

justgord@reddit

.... and now every engineer is too scared to innovate.

[-]

scytheavatar@reddit

Intel's main problem which led to their downfall has been their overeagerness to innovate. To the point that they have been out of touch with their customers and what products actually make money. Lunar Lake being too good and a pyrrhic victory as it killed their margins is the best example of this. Maybe having the fear to innovate will be a good thing for Intel, it was something that helped TSMC reach its monopoly.

[-]

claytonkb@reddit

\^ This is the real truth. Alumni here. It's really just a question of tradeoffs. Sr. management should not push down the consequences of their choices in tradeoffs. If we (engineering) clearly explain the risk/reward ratio to management, we have done our job. The product-planning/market-positioning calls made by Sr. management are risk calls that they make, not the engineering teams. The more "features" and performance requirements you stuff into the next SKU, the more likely it is that one of those features or requirements could result in a stepping-bug. The combined experience of all engineers on the team is insufficient to predict such bugs... precisely because these bugs occur. That is, if it were possible to detect/foresee such bugs (on the given timeline), the masks would never have been shipped to the fab in the first place, the bugs would have been fixed. So, it's an iron-triangle: fast, correct, features -- pick 2. You can have it fast and correct, but drop the features; you can have it correct with all the features, but it's going to take a long time. And if Sr. management demands all the features, and fast, it won't be correct (many steppings)...

[-]

W0LFSTEN@reddit

This is a chip design and production company that has had persistent issues designing and producing chips. This is a massive issue for Intel. Holding people accountable for their literal job description is not some outlandish idea. If I cost my company tens of millions, I would fully expect to be laid off. Maybe I’m just old fashioned 🤷

[-]

mnmevan@reddit

People in the comments didn’t have to deal with the fallout of the 13th/14th Gen Vmin Shift defect. Intel has made my job hell for the past few years. Enough is enough!

[-]

mlloyd@reddit

Enough is enough!

Oh yeah, that told them! They're definitely going to stop screwing off now!

Seriously, what's going to happen is all the folks who were reinvigorated under the leadership of an actual engineer who invested in product to do it right will go to places that value them while the B and C teams stick around and cut features until B0 or die is possible.

So what that really means isn't B0 or die. It's B0 and Intel dies as they become Cirrus Logic due to an inability to be ambitious enough to keep up with the market leaders.

[-]

SubmarineWipers@reddit

An argument can probably be made, that you _can_ improve the process, validation, testing and mindset of peple, even without creating an atmosphere of a gulag in your company.

[-]

Exist50@reddit

RPL was a success, from this perspective. It shipped on B0.

[-]

Exist50@reddit

I don't think anyone is disputing that quality matters and that there needs to be accountability for poor work, but for a bug to reach B0, it's already had to pass through several teams, much less individuals. Which one gets the blame? If not none, then all? Well you can't fire everyone, so then what? There's no good answer here, and it's a distraction from the main issue of how to improve quality from a development process perspective.

And if quality is a consistent issue across the entire company, that's clearly no longer an IC problem, but rather a management one. And people are justifiably worried that management failings result in IC fallout.

[-]

UpsetKoalaBear@reddit

This is the issue I see.

There is little to no way to hold people accountable for things like this (unless they drop the wafer or break the machine lol).

There’s several teams involved before B0, you would need to find a scapegoat. It’s just hostile all around.

[-]

ncook06@reddit

Yeah, I don’t know why people are acting like this is toxic work environment stuff. My last big tech employer had such a positive culture that the core product stagnated and the lack of growth led to mass layoffs. That is a much more toxic work environment.

[-]

namotous@reddit

Lolll you want them to do better? You gotta throw them incentive.

[-]

intronert@reddit

Perhaps the loss of a finger for each bug then?

[-]

Katent1@reddit

I mean it will hold on managment too, right? Right?! xP

[-]

ashvy@reddit

Yeup, thems gonna get perf bonus for finding and addressing the "problem"

[-]

Polar_Banny@reddit

Hold on, you are too fast with assumptions.

[-]

that_dutch_dude@reddit

it should be clarified: Tan is a "shareholders CEO". he is there to make sure the line goes up with no regard of the bodies it leaves in its wake.

intel will not survive him. but the sharholders will cash out before that so they dont care.

[-]

W0LFSTEN@reddit

They are down almost 50k employees since 2022 and yet they are pumping out some of their more competent products in years. The company is having difficulty making a profit and yet you insist that for the company to survive they need to care less about the line going up and more about harboring employees who have had no meaningful positive impact on the company’s trajectory.

[-]

elkond@reddit

they are pumping out products designed half a decade ago, u seriously underestimate how long hardware development cycle is

[-]

W0LFSTEN@reddit

No, I am not underestimating hardware development cycles. And the majority of the cuts aren’t even from design teams.

[-]

Exist50@reddit

and yet they are pumping out some of their more competent products in years

Because of or in spite of him?

The company is having difficulty making a profit

Even with all their quality problems, Intel Products is still very profitable. It's the fab that's dragging down the company as a whole, and this proclamation doesn't affect either way.

[-]

that_dutch_dude@reddit

Tan is riding the results of gelsinger. the current "high" is not his doing, its Pat's.

[-]

Vigilant256@reddit

Not fully, pat did the right thing on the foundry, but didn’t know how to execute . The people that he chose to manage weren’t very good and I’m surprise that he put a head of people ( from hr) to lead data centre. Then he assumes pc demand will go on forever. He wasn’t very careful with finance either , despite knowing that the foundry will require lots of cash , he didn’t control spending , and instead hire more and more VPs.

[-]

Exist50@reddit

Pat, I'd say 50/50. Objectively, his foundry push was a disaster, and not having an AI story is also on him. I think a lot of what Pat said was sensible, but what he did in practice (particularly towards the end), and especially the management he put in charge were not conducive to his goals.

and booted him before they could see it have results

They did see results. Pat gave them a timeline for 18A and the promise of customers. He failed to deliver.

[-]

W0LFSTEN@reddit

Tan’s levers in the short term are more operational and financial than engineering. My point was regarding employee count.

Intel Products is doing quite good. The company is finally seeing some meaningful growth due to AI.

[-]

Deciheximal144@reddit

I interpreted the user's comment as how company should care about the line going up more in the long term than the short term. That people are making money now to let the whole thing crash out later.

[-]

W0LFSTEN@reddit

Caring too much about the long term would have been the quickest way to bankrupt Intel.

The company *needed* to focus on the short term because they weren’t making any money, were losing share, taking on debt and it was in doubt whether they even had a future…

Genuinely, wanting the company to make even more long term investments *than they were already making* would have been one of the most short term solutions the company could have made. It would have burdened the company beyond its means.

[-]

that_dutch_dude@reddit

thats not what i said. but thanks for your input.

[-]

W0LFSTEN@reddit

>[Tan] is there to make sure line goes up with no regard of the bodies it leaves in its wake. Intel will not survive him

By “line goes up” did you not mean “cares about financial metrics?

By “bodies it leaves” did you not mean “fired employees”?

By “will not survive him” did you mean he will bankrupt the company?

I would love to hear your insight.

[-]

65726973616769747461@reddit

And is that somehow worse than a CEO who just coasts on a golden parachute, perfectly content to watch the company burn, regardless of whether it tanks today or tomorrow?

[-]

LuluButterFive@reddit

So he is like every CEO?

[-]

Fr0stCy@reddit

A0 to production has been the dream, but it means accepting an extensive burden of Formal Verification. You either need the manpower to work alongside the design team running simulations and Boolean equivalence constantly, or commit to a 4-8 week design freeze at the end where all of that is done.

I’ve been a part of A0 to prod, but it only happens when management properly commits resources to make it happen. If you’re on C0/D0, either the requirements keep changing under you or the management structure doesn’t work.

[-]

gomurifle@reddit

He is a finance guy right? Doesn't sound like an engineering background.

[-]

bubblesort33@reddit

And then he'll replace you with AI powered by the chip you were designing.

[-]

Xanthyria@reddit

Does no one remember the G0 Q6600? That thing was a beast!

Also absurd how far it went on the stepping

[-]

MediocreAd8440@reddit

The accidental CEO is at it again

[-]

Akeshi@reddit

Ordinarily I'd acknowledge how crazy this is, but I'm still having to put up with a broken 13900k. $4k PC and it can't reliably run even a single command line application.

[-]

FastHotEmu@reddit

Ah yes, Lip-Bu Tan, the former CEO of Cadence whose company pleaded guilty to conspiracy to commit export control violations and had to pay $140 million in fines all under his watch.

[-]

tverbeure@reddit

Over the years, leather jacket man has done plenty of all-hands employee meetings where he said just the opposite: if focus too much on accountability, people become too careful, won't take risks, and eventually a competitor will eat your lunch. Which is exactly what happened to Intel.

[-]

xXBongSlut420Xx@reddit

fucking ridiculous. This will cause more issues than it solves. People will be afraid to report issues. Any failure should rightly be considered a process failure, not a personal failure. Even if someone does something really stupid, there should have been systems in place to prevent that. Making people afraid to make mistakes does not in fact reduce mistakes.

[-]

Nippius@reddit

They looked at Boeing and learned all the wrong lessons...

[-]

imaginary_num6er@reddit

So people should already start looking for jobs at AMD if they know a fix is needed, but it hasn’t been brought up yet.

[-]

W0LFSTEN@reddit

Why aren’t fixes being brought up? That unironically does seem like a good reason to see your employment terminated.

[-]

Exist50@reddit

I think they're taking it too literally, and for the wrong people, but if you assume that bringing up that you introduced a bug gets you fired, then it would make more sense to try to hide it or distance yourself instead. But again, I think that's neither the spirit nor reality of this proclamation.

[-]

elkond@reddit

no, bugs sit in jira for years

[-]

RealThanny@reddit

I think the claim is that with this incentive structure in place, if you find an error after B0, you'll be fired if you report it. So errors will be hidden, perhaps until a new job is found. Or perhaps forever, until someone else finds it.

It's a real problem with this scenario, at least as described by the headline. If you'll be fired for letting errors get past B0 stepping, then you have zero incentive to point out errors that make it past that stepping. The only way to avoid that, which they may be doing, is having multiple isolated teams checking the same subsystems at different times, so that errors will be found by someone who won't be fired for it.

[-]

ListenBeforeSpeaking@reddit

Once they fire the person, I would think they likely fired the person most qualified to fix the problem.

[-]

Exist50@reddit

That's a bit of "firefighting by arsonists" though.

[-]

ListenBeforeSpeaking@reddit

They didn’t set the fire on purpose.

The problem with complex bugs is that the design is likely complex. They probably spent 12-18 months working on whatever their blocks is.

I wouldn’t want to be dropped into someone else’s complex problem, told that a solution is needed immediately, and know that if I miss something I’m going to get fired.

[-]

W0LFSTEN@reddit

I would question whether the people pointing out the errors are even the ones making them. Not sure how Intel operates, but it is not uncommon for these to be completely different teams.

[-]

adamrch@reddit

probably not the same people. But you gotta be a "team player" and not "rock the boat" so you get yourself and others fired for pointing out major issues.

[-]

elkond@reddit

Oh I assure you there's no correlation between how well u do ur job at Intel and being terminated xD

[-]

neuronez@reddit

Frankly if you need three base layer spins to get your chip right you belong in the software industry, not semiconductors

[-]

EmilMR@reddit

with that kind of stress they better pay their verification engineers a proportional salary.

at least he is not talking about AI doing the job.

[-]

RailFan65@reddit

Something tells me most people in the comments haven't had a job.

[-]

Enemiend@reddit

Let's see if this ends up leading in noticable activity in the semi-academic hardware verification community a few years down the line. Particularly the hardware verification competition and the like. Though unfortunately a lot of industry developed tools don't (or cannot) participate. But would be cool to see some advancements here.

[-]

Geddagod@reddit

A lot of the things LBT claims, are eerily similar to what Pat Gelsinger also said.

He also constantly talked about how they need to reduce the number of steppings, and increase employee accountability, so tbd if things actually improve, and LBT's threats here actually result in a meaningful improvement.

[-]

CallMePyro@reddit

If anyone makes a mistake, kill them. Public execution. That will make sure everyone is careful and focused!

[-]

LuluButterFive@reddit

based on these comments im wondering how many people here actually work a day in their lives

[-]

SmashStrider@reddit

This move seems to be fostering a culture of fear rather one of collaboration and excellence, something that would just inadvertently encourage a toxic environment or even outright lying to supposedly meet targets...

[-]

Specific-Path3179@reddit

It's a sound strategy tbh

[-]

ScienceMechEng_Lover@reddit

White people are gonna get a taste of Asian parenting and realise why they're so successful 🔥💪.

[-]

Deciheximal144@reddit

That's right, crack that whip!

Hey, why are all the horses leaving?

[-]

whyte_ryce@reddit

One Intel project started at something like X0 or Z0 because a major feature wasn't going to work but they wanted something out the door asap to test the other stuff. So the company already has a hack for this mandate

[-]

Vaguswarrior@reddit

Ahh toxic work standards. That will help.

[-]

AutoModerator@reddit

Hello CopperSharkk! Please double check that this submission is original reporting and is not an unverified rumor or repost that does not rise to the standards of /r/hardware. If this link is reporting on the work of another site/source or is an unverified rumor, please delete this submission. If this warning is in error, please report this comment and we will remove it.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.