A320 pilot explains what features were rolled back with the update

[-]

HelloSlowly@reddit

From what I’ve read, L104 was pulled because Airbus noticed it didn’t mitigate code error in case of radiation induced bit flips (1,0)

So this feature could easily return in L105 at a later date

[-]

It's hard to understand why a new version of software introducing new features can expose such a serious problem with radiation-induced bit flips on hardware that's been in service for several years...

[-]

nalc@reddit

You can't really shield from single event effects reasonably so mitigation is about doing calculations in different places and comparing the results. Could very well be that the new features weren't implemented with sufficient error checking - either as an oversight or as a way to work around computational capacity limits in older hardware.

[-]

NukeRocketScientist@reddit

There's really only two ways to physically lower the likelihood of bit flips from cosmic rays, one, redundancy, and two, increasing the distance between transistors. One of the downsides of making transistors as small as they are nowadays, they are much more prone to bit flips. This is due to when a cosmic ray proton at high energy interacts with materials, you can get essentially a "splashing" effect of electrons around where the proton went through the material. By having transistors as close as possible this splashing or shockwave of electrons has a higher likelihood of electrons flowing into the transistors imparting a charge and causing a 0 to flip to a 1. Redundancy is important as a cosmic ray interacting with one computer chip wouldn't have any effects on another one nearby and like you said for error checking as well.

You could of course try to physically shield the computer, but trying to stop a cosmic ray proton is far easier said than done as they can travel at more than 50% the speed of light depending on energy. It can also be even worse if you don't stop the proton fully as cosmic rays stop both kinetically and electromagnetically causing the cosmic ray to impart more of its energy into the material than if it was at much higher energy. This is called the Bragg peak and is important in proton beam therapy for treating cancer.

Source: I worked in cosmic ray interactions with materials and semiconductors for my undergrad school's CubeSat program.

[-]

Eastern_Ad6546@reddit

The other option is to use Actor-Judge algorithm(s) and run multiple computers in parallel. You go from 1 flip causing errors to n/2+1 identical errors across n computers to induce failure.

This is what spacex uses for falcon 9 and dragon (human rated). It was one of their big cost reductions and is now found everywhere even on JPL hardware which historically used radiation hardening.

[-]

Beni_Stingray@reddit

That was very interesting, thx.

[-]

k_marts@reddit

This guy cosmic rays

[-]

NukeRocketScientist@reddit

I mean I do have a cat named Proton! Coincidentally, I actually named him before I worked in cosmic ray physics, so it must have just been meant to be.

[-]

sittingwith@reddit

Thank you for sharing your knowledge.

[-]

NukeRocketScientist@reddit

Happy to!

[-]

Mattijjah@reddit

Fine, but still – if it's a matter of insufficient shielding, then this problem should have surfaced long ago – especially considering the number of machines in active service.

If we're saying that "cosmic radiation" is causing these problems on this equipment, then the logical conclusion is that there's actually a hardware problem, and updating the firmware up or down shouldn't make much difference. At least as long as you don't radically change the way you use the equipment (e.g., previous versions of the software had some built-in error correction, which has now been removed because, for example, someone decided they didn't have enough space for their feature...).

[-]

nalc@reddit

insufficient shielding

Sufficient shielding isn't really a thing for this - it's radiation, not electromagnetic interference, so typical conductive shielding doesn't do much

logical conclusion is that there's actually a hardware problem

Yeah, the hardware problem is that small delicate electronics are sensitive to radiation

updating the firmware up or down shouldn't make much difference

It does. Let's say you store a value once in memory and it gets bit flipped. It will cause a problem. Now let's say you store it twice and check that both values match before doing anything. It won't cause a problem.

[-]

Mattijjah@reddit

Shielding is quite important – if a firmware change alone can create such a problem, it means that the electronics are not properly shielded in the first place. Any software features intended to mitigate the problem should be merely a supplement/redundancy, and not, as is evident in this case, the main line of defence...

[-]

knobtasticus@reddit

You seem to be missing a fundamental characteristic is physical shielding - weight. Effective cosmic radiation shielding is a multi-layered, heavy process using thick, dense materials and/or substantial amounts of hydrogen-rich materials like water. And even with every possible physical barrier in place, there’s no guarantee of imperviousness. Furthermore, many materials give off secondary and just as damaging radiation when struck by cosmic radiation.

Weight is a substantial consideration for commercial aircraft so, the correct and sensible tactic for mitigating the risk of cosmic-induced bit-flips is a hybrid of physical barrier and software-based error correction. The most recent software update impacted the effectiveness of this error-correction. The temporary rollback has restored that effectiveness.

[-]

Mattijjah@reddit

Okay, but you seem to be overlooking another, even more important characteristic – reliability.

If you have a mission-critical device, on which the lives of the crew and passengers literally depend, can't you make such compromises that you build it de facto defective and then try to patch the problem with software?

How much will this additional shielding weigh? A few kilograms at most? Is it really that much?

[-]

RdPirate@reddit

We make cosmic radiation detectors under mountains cause cosmic rays care not. So about a planets worth of weight?

[-]

Mattijjah@reddit

Yes, in a situation where we want to detect neutrinos and need to filter out 99.9999% of the remaining noise :)

Here, a relatively simple (and really lightweight) shield will suffice, weighing significantly less than a single overloaded passenger's checked luggage... ;)

[-]

RdPirate@reddit

But those particles also flip bits. And to you software error correction with similar failure rates was unacceptable, so.....

[-]

Mattijjah@reddit

Or properly designed hardware that won't be as susceptible to bit skipping caused by cosmic rays. ;)

That's called fail-safe design. If a simple software update, which happened to lack some protection from previous versions, reveals such potentially catastrophic hardware design flaws, then sorry, but no...

[-]

LandscapePenguin@reddit

Please, enlighten us. How do you design a piece of hardware to be immune from cosmic-ray bit-flipping? Let's take a simple flashing LED as an example. Be specific so we can all easily understand exactly how to properly design this hardware.

[-]

Mattijjah@reddit

Okay. Designing radiation-safe hardware relies on fault tolerance, not an impossible physical shield. For your "blinking LED", this is achieved through Triple Modular Redundancy (TMR), where three identical chips store the same state. A voting chip selects the majority result, immediately correcting the SEU at the hardware level before the error affects the output. In the ELAC case, the problem was that new software overrode this built-in TMR defence, giving a random cosmic particle time to trigger a fatal error. This is a process error, not a fundamental material defect.

[-]

ThrowAwaAlpaca@reddit

Ha the reddit armchair engineers strike again, yeah just a few kilograms the shield the thousands of chips in an airplane /facepalm

[-]

Mattijjah@reddit

How much would this "terrible, decent radiation shield" weigh exactly? Do you realise that it's possible to create an effective yet lightweight shield using appropriate synthetic materials? You do realise that, right, or are you still stuck in the 1970s? ;)

[-]

ThrowAwaAlpaca@reddit

It doesn't exist. 1m of water in all direction maybe.

[-]

Mattijjah@reddit

But we're not talking about shielding devices operating right next to nuclear reactors or in deep space (beyond Earth's magnetic field). The purpose of such shielding is simply to counteract the effects of the thin atmosphere at cruising altitude, which has less ability to retain such particles than at the surface. This isn't really rocket science...

[-]

knobtasticus@reddit

No, we’re talking about extremely high energy particles that we simply don’t have the materials technology to 100% shield against. You talk about it like it’s simple stuff that we’ve been doing for decades. It isn’t and we haven’t. 100% shielding of microelectronics against cosmic radiation is, at this time, impossible. We can’t do it. To get the risk down to almost negligible levels, we have been using large amounts of extremely heavy and thick materials ranging from lead and concrete to water. None of these materials are practical for large-scale use in commercial flight. What we HAVE been doing - to great success - for many decades, is software error-correction. It isn’t a compromise or a failure, it’s a tried-and-true method for mitigating the risks of bit-flip. Just accept that and move on.

[-]

ThrowAwaAlpaca@reddit

Lmao. Yeah nothing like reddit trolls saying it's not rocket science to the actual engineers.

[-]

nalc@reddit

I take it you have no relevant professional experience? As I said, it's not EMI shielding where you just throw some copper foil around it and fall it good. It's like "encased in lead or submerged in water" type of radiation hardening you'd need to be fully immune to it. Airborne systems need to be designed as though bit flips are unavoidable.

[-]

Mattijjah@reddit

I have enough experience to see that someone completely screwed up in this matter, and now they're trying to put on a good face...

It's the same as if, with the MCAS on the Boeing, you said, "Oh, come on, let's add support for an additional sensor, and that's it." No, man – you have equipment that has a fundamental design flaw, and for years it "worked" until now... You can't cut corners like that, especially with such a heavily computerised aircraft.

Following that line of thinking, how many more of these "surprises" are there waiting to be discovered?

[-]

Apophyx@reddit

Your "fundamental design flaw" is the very nature of electronics and radiation. Every piece of electronic hardware in the world is susceptible to bit flips. Bit flips are isolated events that are expected and mitigated with redundancy in the code, as many people have already tried to explain to you.

[-]

davispw@reddit

Solution A) A ridiculous amount of shielding to prevent any/all atomic particles from ever flipping a bit in RAM or CPU.

Effectiveness: it’s physically impossible to block all radiation

Cost: weight…lots of weight, something you don’t want in aerospace

Solution B) Checksum all data and calculations in software, with fallback to redundancy if checksums fail

Effectiveness: 99.99xx%

Cost: zero net weight, a few transistors. The real cost is in software engineering and testing (i.e., time and human salaries)…which apparently Airbus skipped in this case. The techniques are well-known and have been proven over decades of aerospace software experience, and battle-hardened in extreme radiation environments.

Why are you arguing for solution A?

[-]

induality@reddit

Software correcting for hardware noise is a fundamental part of signals processing. It’s what makes digital computing possible in the first place.

[-]

Mattijjah@reddit

Yes, but if you're designing electronics to operate in harsh environments, such as high radiation levels, your primary focus is on designing and shielding the hardware so that it can withstand as much abuse as possible, and so that you don't have to waste its processing power on extra-redundant error correction—which will ultimately only "make up for the problem" if design errors were made in the hardware...

It's like how we design space probes - what good is correcting errors in software if the hardware isn't resistant to radiation and malfunctions shortly after it reaches orbit?

[-]

davispw@reddit

You’re misunderstanding the issue. In space probes, the radiation is so extreme that transistors will degrade and fail—the radiation eats away at them, and no human can repair them (with the one exception of the Hubble Space Telescope—unlikely to be repeated ever again). The main purpose of the shielding is to reduce the radiation to a manageable level—still higher than airplane electronics will experience at 35,000 foot altitude, but to be less destructive.

You still need software mitigations in space probes—even moreso because there’s no fallback to human pilots, and any loss in attitude control means the probe could permanently lose contact with earth (antennas must be pointing precisely for communication) and/or lose power (solar panels must be angled toward the sun, and electronics and batteries will freeze and be destroyed if continuous thermal management is lost).

Software for space probes is where the techniques for checksumming and redundancy were pioneered and battle-tested. These are the basic techniques that need to be applied to airplane flight control computers.

[-]

Glass_Landscape_8588@reddit

The hardware has apparently inescapable limitations with respect to radiation errors. Prior software versions had apparently mitigated these problems. L104 seems to have failed to do so. This is a failure in system design/testing as they did not include sufficient protections for previously know and accounted for edge cases/errors.

[-]

tropicbrownthunder@reddit

Probably the later aviation computational hardware is by nature underpowered for current standards. Certification and thorough testing takes so much that when goes green probably stays at least 2 or 3 hardware generations behind.

[-]

nalc@reddit

There's actually an effect here in that smaller process nodes are inherently more susceptible to radiation induced effects because the transistor gates are smaller. You actually want big old dumb electronics when it comes to tolerating radiation. The tradeoff of course is that with newer generation stuff you can pack in more redundancy and error checking to help compensate for the increased susceptibility to radiation effects.

[-]

ZeePM@reddit

You can see this in the RAD750 that are used in satellites and probes sent into deep space. These still use 250nm to 150nm photolithography process. Modern state of the art processors in smartphones at pushing 2nm now.

[-]

Stealth100@reddit

Erro checking/correction should either be handled natively via IoC or mandated for certification. There’s no justifiable reason for how this went to production other than negligence.

[-]

frisky0330@reddit

Bench testing does differ from practical application. Though in this case Airbus missed the proverbial bullet as here there could have been potentially fatal consequences during the period since L104 was first implemented. Thankfully, the issue was identified and immediate action taken.

[-]

Tricksilver89@reddit

Well even if both ELACs went haywire, they have the SECs (Spoiler Elevator Computer) as further backup still as it is able to control roll.

[-]

a_lumberjack@reddit

The hardware effects have always been known, the software update broke the error checking that caught it when it happens. Regressions are a pain in the ass.

[-]

Mattijjah@reddit

Okay, that explains a lot, but still – someone must have failed miserably in the testing phase if the change removing these software security features was approved.

This immediately raises another question – did other components in current software versions have this feature, and for example, it was removed at some point, and we haven't found out yet (because we were lucky)?

[-]

Glass_Landscape_8588@reddit

Yes. I'd imagine Airbus will face legal consequences for this incident. Only Airbus and regulators have the capacity to investigate the state of error protection across all their systems.

[-]

curiousengineer601@reddit

What legal issues could they face?

[-]

Loudergood@reddit

All those airlines had to call in extra technicians, no doubt they will want some recompense.

[-]

mig82au@reddit

For a Thales computer? I think not.

[-]

meshreplacer@reddit

You would think the lower level software ie OS would handle memory correction and recovery.

[-]

Jaggedmallard26@reddit

The lower level doesn't know what's critical and blanket applying it would have a gigantic performance cost. It's generally better to have it done at the application software where the levels of mitigations can be scaled to importance. Although it runs into the problem that things can slip through the cracks if processes aren't up to cracks. But that's true of any safety critical bit of software regardless of if a system can theoretically mitigate it by forming quorums for every single operation.

[-]

Mattijjah@reddit

If Airbus recommends downgrading the software to the previous version in case of such a serious problem, well, that explains it itself, that apparently the previous software version had something that was missing in the current one...

[-]

snoromRsdom@reddit

Exactly the opposite. Stop speculating. You are terrible at it.

[-]

SantaGamer@reddit

Why though? Totally possible.

[-]

Mattijjah@reddit

With a properly designed and evaluated design and certification process, such a simple mistake? Well, not really...

[-]

Accomplished-Pound32@reddit

Software development 101: if it ain't broke, fix it until it breaks

[-]

Mattijjah@reddit

Well, in this case, someone didn't apply this rule, since a global rollback had to be done...

[-]

Accomplished-Pound32@reddit

Step 2: when it breaks roll back to precious stable version

Repeat steps 1 & 2 until client runs out of money

[-]

dtp502@reddit

Typically there are checks that validate a received message for errors. These checks are called checksums. There are various implementations of it in software.

Maybe they found that the implementation of the checksum didn’t meet their standards. Kinda scary to think the code reviews didn’t catch this prior to being uploaded to a bunch of aircraft though.

[-]

Ramenastern@reddit

Yeah, it seems the rollback was a much quicker solution (because you're going back to an already-approved version that doesn't have the problematic behaviour) than developing and certifying a fixed L104.

[-]

ScaredScorpion@reddit

"Rollback first, debug later" is standard engineering practice. The process failed to catch the issue with L104, you don't roll forward from that.

[-]

snoromRsdom@reddit

It is not a matter of it being quicker. It was the ONLY solution. That version was already certified. A new version L105 with a fix has to be certified. That will take time.

[-]

byteuser@reddit

Apparently the L104 software version, seemingly removed or relaxed a crucial "sanity check" (Slew Rate Limiter). When a solar particle flips a bit in the cache, say turning a 5° pitch into a 5000° dive command instantly , the software should reject it as physically impossible because an aileron can't move that fast. Instead, the L104 software blindly trusts the corrupted "scratchpad" data.

This is a fuck up at code level that Airbus is blaming on the Sun. That's why they're going back to the L103 version. My guess is they skipped with sanity check for the cases in which turbulence could account for the big jump in values leaving the system exposed to bit flips at the L1 cpu cache.

What's worse is that the specific solar flare was an event that can affect the plane computers on the ground. Make it potentially disastrous during takeoff or landing,

[-]

Insaneclown271@reddit

What is the point of this dudes post? They are rolling back due to quite a critical weakness of the update. The tone of this post is weird and not purely informative.

[-]

Zwolfer@reddit

It was written by AI

[-]

aegatech@reddit

ELI5

[-]

IM_REFUELING@reddit

Years after a certain Air France mishap where they stalled a perfectly flyable aircraft into the ocean due to a stunning lack of airmanship and CRM, Airbus developed a software fix to have the plane stop that from happening rather than the airlines actually train their pilots better.

Now hoes are mad that those features are removed and they might actually fly the jet if things get degraded.

[-]

ErIDontKnowMaybe@reddit

There was no software fix for what happened in 447. You could still do the exact same thing if you really wanted. What they introduced was UPRT. Not sure why this person is so emphatic about this while also being completely wrong

[-]

neat_klingon@reddit

Butthurt Böing fan

[-]

syrian_samuel@reddit

Not sure why this person is so emphatic about this while also being completely wrong

Story of the internet

[-]

Klutzy-Residen@reddit

The great thing about commercial aviation is that you don't look at the pilot errors and conclude that the root cause is the pilots being stupid.

You find all the possible reasons for the pilots making a mistake and make an effort to reduce them so that it doesn't happen in the future.

[-]

TinyCopy5841@reddit

That still relies on having competent pilots in the first place. None of the swiss cheese BS matters at all if every pilot was highly incompetent.

[-]

Klutzy-Residen@reddit

AF447 is a somewhat bad example of this as the pilots did disregard procedure, but there were still changes made to help prevent this and similar accidents from happening in the future.

https://sassofia.com/wp-content/uploads/2025/05/Case-Study-Air-France-Flight-447-%E2%80%93-Automation-Masking-and-Loss-of-Situational-Awareness.pdf page 2/3

- Enhanced Pilot Training for Manual Handling
- Revised Procedures for Airspeed Discrepancies
- Improvements in System Feedback and Design
- Stronger Focus on CRM and Non-Technical Skills
- Regulatory Oversight on Automation Reliance

[-]

TinyCopy5841@reddit

Enhanced Pilot Training for Manual Handling

This is essentially the idea that 'yep, those pilots were incompetent, we should make sure they actually know how to fly'. Of course this didn't stop Air Asia 8501, which was a very similar incident.

[-]

nicerakc@reddit

The computer would try to prevent the aircraft from stalling or breaking up in certain situations.

[-]

1008oh@reddit

Would be nice if the tweet didn’t reek of AI

Reduces credibility quite a lot

[-]

snoromRsdom@reddit

People whining about AI is so 2025. Move on. The world has and so has AI (ChatGPT just confirmed that, but I can't share it here do to the new rules about AI posts).

[-]

1008oh@reddit

Simping to clankers is something else buddy

[-]

CeleritasLucis@reddit (OP)

People now a days tend to run their posts through chatGPT to clean it up. Although I agree with you, it reduces the credibility instead of increasing it with polished language

[-]

EGLLRJTT24@reddit

to clean it up

I'd really rather people didn't do this. Makes me trust what are probably extremely reliable sources far less.

[-]

Ustakion@reddit

In here, most people do this now in company email to correct grammar and spelling mistakes as english is not our first language

[-]

naggyman@reddit

What was wrong with standard spell check?

[-]

TinyCopy5841@reddit

That it only fixes spelling mistakes and doesn't help with grammar or style? Seems pretty obvious.

[-]

ballimi@reddit

Standard spell check fixes mistakes in words and grammar, it doesn't clean up your sentences. For example, if I directly translate from Dutch I get this sentence:

That works on my nerves.

Which a standard spell check doesn't flag.

[-]

CeleritasLucis@reddit (OP)

ChatGPT also has a very bad confirmation bias. It would pull absolute made up bullshit to support your viewpoint, which you could only detect if you really know about that stuff

[-]

EGLLRJTT24@reddit

Yeah that's been an issue in my line of work since day dot, it's not really gotten any better. If anything it's gotten worse with more people barging in and using this shitty tech.

[-]

Dear_Smoke6964@reddit

Reddit might hate ai but see what happens when you mistake Their for There.

[-]

LawManActual@reddit

I got called a douchebag because I said sometimes my phone legit changes things like their/there/they’re sometimes and it’s not really a big deal.

I guess I’m a lazy douche, haha

[-]

SeenSoManyThings@reddit

Self awareness is underrated these days.

[-]

1008oh@reddit

I agree, I use AI quite a lot and even in this type of usage it can throw in some false info so I’m a bit suspicious

[-]

Pop-metal@reddit

Bottoms line: this 100%

[-]

IM_REFUELING@reddit

Airbus bros really will do anything but emphasize basic stick and rudder airmanship when things go south.

[-]

ahpc82@reddit

Idk why you are getting downvoted bro lol It’s an entire company hellbent on getting rid of two-pilot cockpits. [1]

Just fly the damned planes already.

[1] https://www.forbes.com/sites/tedreed/2024/06/29/pilot-leader-blasts-airbus-for-backing-single-pilot-flight-deck---its-insane/

[-]

Northern_Blights@reddit

Aren't those images at the bottom reversed? Feels like pitch angle is very much not being limited in the right image.

[-]

balunstormhands@reddit

Solar radiation is important now because we are in a solar maximum right now. The last one was 11 years ago with a the last big CME (was a near miss) in 2012, that's long enough for companies to forget the lessons learned.

It's like how I lived in a place that was in drought for 20 years and when it rain for a week straight, there were lots of flooding. Nothing had been tested nor cleaned in decades, so didn't work.

[-]

Coincidence? Most likely, but an interesting coincidence regardless.

[-]

CeleritasLucis@reddit (OP)

Source : https://x.com/chainsawrocks/status/1994784339784339555?s=61