A320 pilot explains what features were rolled back with the update
Posted by CeleritasLucis@reddit | aviation | View on Reddit | 98 comments
Posted by CeleritasLucis@reddit | aviation | View on Reddit | 98 comments
HelloSlowly@reddit
From what I’ve read, L104 was pulled because Airbus noticed it didn’t mitigate code error in case of radiation induced bit flips (1,0)
So this feature could easily return in L105 at a later date
Mattijjah@reddit
It's hard to understand why a new version of software introducing new features can expose such a serious problem with radiation-induced bit flips on hardware that's been in service for several years...
nalc@reddit
You can't really shield from single event effects reasonably so mitigation is about doing calculations in different places and comparing the results. Could very well be that the new features weren't implemented with sufficient error checking - either as an oversight or as a way to work around computational capacity limits in older hardware.
NukeRocketScientist@reddit
There's really only two ways to physically lower the likelihood of bit flips from cosmic rays, one, redundancy, and two, increasing the distance between transistors. One of the downsides of making transistors as small as they are nowadays, they are much more prone to bit flips. This is due to when a cosmic ray proton at high energy interacts with materials, you can get essentially a "splashing" effect of electrons around where the proton went through the material. By having transistors as close as possible this splashing or shockwave of electrons has a higher likelihood of electrons flowing into the transistors imparting a charge and causing a 0 to flip to a 1. Redundancy is important as a cosmic ray interacting with one computer chip wouldn't have any effects on another one nearby and like you said for error checking as well.
You could of course try to physically shield the computer, but trying to stop a cosmic ray proton is far easier said than done as they can travel at more than 50% the speed of light depending on energy. It can also be even worse if you don't stop the proton fully as cosmic rays stop both kinetically and electromagnetically causing the cosmic ray to impart more of its energy into the material than if it was at much higher energy. This is called the Bragg peak and is important in proton beam therapy for treating cancer.
Source: I worked in cosmic ray interactions with materials and semiconductors for my undergrad school's CubeSat program.
Eastern_Ad6546@reddit
The other option is to use Actor-Judge algorithm(s) and run multiple computers in parallel. You go from 1 flip causing errors to n/2+1 identical errors across n computers to induce failure.
This is what spacex uses for falcon 9 and dragon (human rated). It was one of their big cost reductions and is now found everywhere even on JPL hardware which historically used radiation hardening.
Beni_Stingray@reddit
That was very interesting, thx.
k_marts@reddit
This guy cosmic rays
NukeRocketScientist@reddit
I mean I do have a cat named Proton! Coincidentally, I actually named him before I worked in cosmic ray physics, so it must have just been meant to be.
sittingwith@reddit
Thank you for sharing your knowledge.
NukeRocketScientist@reddit
Happy to!
Mattijjah@reddit
Fine, but still – if it's a matter of insufficient shielding, then this problem should have surfaced long ago – especially considering the number of machines in active service.
If we're saying that "cosmic radiation" is causing these problems on this equipment, then the logical conclusion is that there's actually a hardware problem, and updating the firmware up or down shouldn't make much difference. At least as long as you don't radically change the way you use the equipment (e.g., previous versions of the software had some built-in error correction, which has now been removed because, for example, someone decided they didn't have enough space for their feature...).
nalc@reddit
Sufficient shielding isn't really a thing for this - it's radiation, not electromagnetic interference, so typical conductive shielding doesn't do much
Yeah, the hardware problem is that small delicate electronics are sensitive to radiation
It does. Let's say you store a value once in memory and it gets bit flipped. It will cause a problem. Now let's say you store it twice and check that both values match before doing anything. It won't cause a problem.
Mattijjah@reddit
Shielding is quite important – if a firmware change alone can create such a problem, it means that the electronics are not properly shielded in the first place. Any software features intended to mitigate the problem should be merely a supplement/redundancy, and not, as is evident in this case, the main line of defence...
knobtasticus@reddit
You seem to be missing a fundamental characteristic is physical shielding - weight. Effective cosmic radiation shielding is a multi-layered, heavy process using thick, dense materials and/or substantial amounts of hydrogen-rich materials like water. And even with every possible physical barrier in place, there’s no guarantee of imperviousness. Furthermore, many materials give off secondary and just as damaging radiation when struck by cosmic radiation.
Weight is a substantial consideration for commercial aircraft so, the correct and sensible tactic for mitigating the risk of cosmic-induced bit-flips is a hybrid of physical barrier and software-based error correction. The most recent software update impacted the effectiveness of this error-correction. The temporary rollback has restored that effectiveness.
Mattijjah@reddit
Okay, but you seem to be overlooking another, even more important characteristic – reliability.
If you have a mission-critical device, on which the lives of the crew and passengers literally depend, can't you make such compromises that you build it de facto defective and then try to patch the problem with software?
How much will this additional shielding weigh? A few kilograms at most? Is it really that much?
RdPirate@reddit
We make cosmic radiation detectors under mountains cause cosmic rays care not. So about a planets worth of weight?
Mattijjah@reddit
Yes, in a situation where we want to detect neutrinos and need to filter out 99.9999% of the remaining noise :)
Here, a relatively simple (and really lightweight) shield will suffice, weighing significantly less than a single overloaded passenger's checked luggage... ;)
RdPirate@reddit
But those particles also flip bits. And to you software error correction with similar failure rates was unacceptable, so.....
Mattijjah@reddit
Or properly designed hardware that won't be as susceptible to bit skipping caused by cosmic rays. ;)
That's called fail-safe design. If a simple software update, which happened to lack some protection from previous versions, reveals such potentially catastrophic hardware design flaws, then sorry, but no...
LandscapePenguin@reddit
Please, enlighten us. How do you design a piece of hardware to be immune from cosmic-ray bit-flipping? Let's take a simple flashing LED as an example. Be specific so we can all easily understand exactly how to properly design this hardware.
Mattijjah@reddit
Okay. Designing radiation-safe hardware relies on fault tolerance, not an impossible physical shield. For your "blinking LED", this is achieved through Triple Modular Redundancy (TMR), where three identical chips store the same state. A voting chip selects the majority result, immediately correcting the SEU at the hardware level before the error affects the output. In the ELAC case, the problem was that new software overrode this built-in TMR defence, giving a random cosmic particle time to trigger a fatal error. This is a process error, not a fundamental material defect.
ThrowAwaAlpaca@reddit
Ha the reddit armchair engineers strike again, yeah just a few kilograms the shield the thousands of chips in an airplane /facepalm
Mattijjah@reddit
How much would this "terrible, decent radiation shield" weigh exactly? Do you realise that it's possible to create an effective yet lightweight shield using appropriate synthetic materials? You do realise that, right, or are you still stuck in the 1970s? ;)
ThrowAwaAlpaca@reddit
It doesn't exist. 1m of water in all direction maybe.
Mattijjah@reddit
But we're not talking about shielding devices operating right next to nuclear reactors or in deep space (beyond Earth's magnetic field). The purpose of such shielding is simply to counteract the effects of the thin atmosphere at cruising altitude, which has less ability to retain such particles than at the surface. This isn't really rocket science...
knobtasticus@reddit
No, we’re talking about extremely high energy particles that we simply don’t have the materials technology to 100% shield against. You talk about it like it’s simple stuff that we’ve been doing for decades. It isn’t and we haven’t. 100% shielding of microelectronics against cosmic radiation is, at this time, impossible. We can’t do it. To get the risk down to almost negligible levels, we have been using large amounts of extremely heavy and thick materials ranging from lead and concrete to water. None of these materials are practical for large-scale use in commercial flight. What we HAVE been doing - to great success - for many decades, is software error-correction. It isn’t a compromise or a failure, it’s a tried-and-true method for mitigating the risks of bit-flip. Just accept that and move on.
ThrowAwaAlpaca@reddit
Lmao. Yeah nothing like reddit trolls saying it's not rocket science to the actual engineers.
nalc@reddit
I take it you have no relevant professional experience? As I said, it's not EMI shielding where you just throw some copper foil around it and fall it good. It's like "encased in lead or submerged in water" type of radiation hardening you'd need to be fully immune to it. Airborne systems need to be designed as though bit flips are unavoidable.
Mattijjah@reddit
I have enough experience to see that someone completely screwed up in this matter, and now they're trying to put on a good face...
It's the same as if, with the MCAS on the Boeing, you said, "Oh, come on, let's add support for an additional sensor, and that's it." No, man – you have equipment that has a fundamental design flaw, and for years it "worked" until now... You can't cut corners like that, especially with such a heavily computerised aircraft.
Following that line of thinking, how many more of these "surprises" are there waiting to be discovered?
Apophyx@reddit
Your "fundamental design flaw" is the very nature of electronics and radiation. Every piece of electronic hardware in the world is susceptible to bit flips. Bit flips are isolated events that are expected and mitigated with redundancy in the code, as many people have already tried to explain to you.
davispw@reddit
Solution A) A ridiculous amount of shielding to prevent any/all atomic particles from ever flipping a bit in RAM or CPU.
Effectiveness: it’s physically impossible to block all radiation
Cost: weight…lots of weight, something you don’t want in aerospace
Solution B) Checksum all data and calculations in software, with fallback to redundancy if checksums fail
Effectiveness: 99.99xx%
Cost: zero net weight, a few transistors. The real cost is in software engineering and testing (i.e., time and human salaries)…which apparently Airbus skipped in this case. The techniques are well-known and have been proven over decades of aerospace software experience, and battle-hardened in extreme radiation environments.
Why are you arguing for solution A?
induality@reddit
Software correcting for hardware noise is a fundamental part of signals processing. It’s what makes digital computing possible in the first place.
Mattijjah@reddit
Yes, but if you're designing electronics to operate in harsh environments, such as high radiation levels, your primary focus is on designing and shielding the hardware so that it can withstand as much abuse as possible, and so that you don't have to waste its processing power on extra-redundant error correction—which will ultimately only "make up for the problem" if design errors were made in the hardware...
It's like how we design space probes - what good is correcting errors in software if the hardware isn't resistant to radiation and malfunctions shortly after it reaches orbit?
davispw@reddit
You’re misunderstanding the issue. In space probes, the radiation is so extreme that transistors will degrade and fail—the radiation eats away at them, and no human can repair them (with the one exception of the Hubble Space Telescope—unlikely to be repeated ever again). The main purpose of the shielding is to reduce the radiation to a manageable level—still higher than airplane electronics will experience at 35,000 foot altitude, but to be less destructive.
You still need software mitigations in space probes—even moreso because there’s no fallback to human pilots, and any loss in attitude control means the probe could permanently lose contact with earth (antennas must be pointing precisely for communication) and/or lose power (solar panels must be angled toward the sun, and electronics and batteries will freeze and be destroyed if continuous thermal management is lost).
Software for space probes is where the techniques for checksumming and redundancy were pioneered and battle-tested. These are the basic techniques that need to be applied to airplane flight control computers.
Glass_Landscape_8588@reddit
The hardware has apparently inescapable limitations with respect to radiation errors. Prior software versions had apparently mitigated these problems. L104 seems to have failed to do so. This is a failure in system design/testing as they did not include sufficient protections for previously know and accounted for edge cases/errors.
tropicbrownthunder@reddit
Probably the later aviation computational hardware is by nature underpowered for current standards. Certification and thorough testing takes so much that when goes green probably stays at least 2 or 3 hardware generations behind.
nalc@reddit
There's actually an effect here in that smaller process nodes are inherently more susceptible to radiation induced effects because the transistor gates are smaller. You actually want big old dumb electronics when it comes to tolerating radiation. The tradeoff of course is that with newer generation stuff you can pack in more redundancy and error checking to help compensate for the increased susceptibility to radiation effects.
ZeePM@reddit
You can see this in the RAD750 that are used in satellites and probes sent into deep space. These still use 250nm to 150nm photolithography process. Modern state of the art processors in smartphones at pushing 2nm now.
Stealth100@reddit
Erro checking/correction should either be handled natively via IoC or mandated for certification. There’s no justifiable reason for how this went to production other than negligence.
frisky0330@reddit
Bench testing does differ from practical application. Though in this case Airbus missed the proverbial bullet as here there could have been potentially fatal consequences during the period since L104 was first implemented. Thankfully, the issue was identified and immediate action taken.
Tricksilver89@reddit
Well even if both ELACs went haywire, they have the SECs (Spoiler Elevator Computer) as further backup still as it is able to control roll.
a_lumberjack@reddit
The hardware effects have always been known, the software update broke the error checking that caught it when it happens. Regressions are a pain in the ass.
Mattijjah@reddit
Okay, that explains a lot, but still – someone must have failed miserably in the testing phase if the change removing these software security features was approved.
This immediately raises another question – did other components in current software versions have this feature, and for example, it was removed at some point, and we haven't found out yet (because we were lucky)?
Glass_Landscape_8588@reddit
Yes. I'd imagine Airbus will face legal consequences for this incident. Only Airbus and regulators have the capacity to investigate the state of error protection across all their systems.
curiousengineer601@reddit
What legal issues could they face?
Loudergood@reddit
All those airlines had to call in extra technicians, no doubt they will want some recompense.
mig82au@reddit
For a Thales computer? I think not.
meshreplacer@reddit
You would think the lower level software ie OS would handle memory correction and recovery.
Jaggedmallard26@reddit
The lower level doesn't know what's critical and blanket applying it would have a gigantic performance cost. It's generally better to have it done at the application software where the levels of mitigations can be scaled to importance. Although it runs into the problem that things can slip through the cracks if processes aren't up to cracks. But that's true of any safety critical bit of software regardless of if a system can theoretically mitigate it by forming quorums for every single operation.
Mattijjah@reddit
If Airbus recommends downgrading the software to the previous version in case of such a serious problem, well, that explains it itself, that apparently the previous software version had something that was missing in the current one...
snoromRsdom@reddit
Exactly the opposite. Stop speculating. You are terrible at it.
SantaGamer@reddit
Why though? Totally possible.
Mattijjah@reddit
With a properly designed and evaluated design and certification process, such a simple mistake? Well, not really...
Accomplished-Pound32@reddit
Software development 101: if it ain't broke, fix it until it breaks
Mattijjah@reddit
Well, in this case, someone didn't apply this rule, since a global rollback had to be done...
Accomplished-Pound32@reddit
Step 2: when it breaks roll back to precious stable version
Repeat steps 1 & 2 until client runs out of money
dtp502@reddit
Typically there are checks that validate a received message for errors. These checks are called checksums. There are various implementations of it in software.
Maybe they found that the implementation of the checksum didn’t meet their standards. Kinda scary to think the code reviews didn’t catch this prior to being uploaded to a bunch of aircraft though.
Ramenastern@reddit
Yeah, it seems the rollback was a much quicker solution (because you're going back to an already-approved version that doesn't have the problematic behaviour) than developing and certifying a fixed L104.
ScaredScorpion@reddit
"Rollback first, debug later" is standard engineering practice. The process failed to catch the issue with L104, you don't roll forward from that.
snoromRsdom@reddit
It is not a matter of it being quicker. It was the ONLY solution. That version was already certified. A new version L105 with a fix has to be certified. That will take time.
byteuser@reddit
Apparently the L104 software version, seemingly removed or relaxed a crucial "sanity check" (Slew Rate Limiter). When a solar particle flips a bit in the cache, say turning a 5° pitch into a 5000° dive command instantly , the software should reject it as physically impossible because an aileron can't move that fast. Instead, the L104 software blindly trusts the corrupted "scratchpad" data.
This is a fuck up at code level that Airbus is blaming on the Sun. That's why they're going back to the L103 version. My guess is they skipped with sanity check for the cases in which turbulence could account for the big jump in values leaving the system exposed to bit flips at the L1 cpu cache.
What's worse is that the specific solar flare was an event that can affect the plane computers on the ground. Make it potentially disastrous during takeoff or landing,
Insaneclown271@reddit
What is the point of this dudes post? They are rolling back due to quite a critical weakness of the update. The tone of this post is weird and not purely informative.
Zwolfer@reddit
It was written by AI
aegatech@reddit
ELI5
IM_REFUELING@reddit
Years after a certain Air France mishap where they stalled a perfectly flyable aircraft into the ocean due to a stunning lack of airmanship and CRM, Airbus developed a software fix to have the plane stop that from happening rather than the airlines actually train their pilots better.
Now hoes are mad that those features are removed and they might actually fly the jet if things get degraded.
ErIDontKnowMaybe@reddit
There was no software fix for what happened in 447. You could still do the exact same thing if you really wanted. What they introduced was UPRT. Not sure why this person is so emphatic about this while also being completely wrong
neat_klingon@reddit
Butthurt Böing fan
syrian_samuel@reddit
Story of the internet
Klutzy-Residen@reddit
The great thing about commercial aviation is that you don't look at the pilot errors and conclude that the root cause is the pilots being stupid.
You find all the possible reasons for the pilots making a mistake and make an effort to reduce them so that it doesn't happen in the future.
TinyCopy5841@reddit
That still relies on having competent pilots in the first place. None of the swiss cheese BS matters at all if every pilot was highly incompetent.
Klutzy-Residen@reddit
AF447 is a somewhat bad example of this as the pilots did disregard procedure, but there were still changes made to help prevent this and similar accidents from happening in the future.
https://sassofia.com/wp-content/uploads/2025/05/Case-Study-Air-France-Flight-447-%E2%80%93-Automation-Masking-and-Loss-of-Situational-Awareness.pdf page 2/3
- Enhanced Pilot Training for Manual Handling
- Revised Procedures for Airspeed Discrepancies
- Improvements in System Feedback and Design
- Stronger Focus on CRM and Non-Technical Skills
- Regulatory Oversight on Automation Reliance
TinyCopy5841@reddit
This is essentially the idea that 'yep, those pilots were incompetent, we should make sure they actually know how to fly'. Of course this didn't stop Air Asia 8501, which was a very similar incident.
nicerakc@reddit
The computer would try to prevent the aircraft from stalling or breaking up in certain situations.
1008oh@reddit
Would be nice if the tweet didn’t reek of AI
Reduces credibility quite a lot
snoromRsdom@reddit
People whining about AI is so 2025. Move on. The world has and so has AI (ChatGPT just confirmed that, but I can't share it here do to the new rules about AI posts).
1008oh@reddit
Simping to clankers is something else buddy
CeleritasLucis@reddit (OP)
People now a days tend to run their posts through chatGPT to clean it up. Although I agree with you, it reduces the credibility instead of increasing it with polished language
EGLLRJTT24@reddit
I'd really rather people didn't do this. Makes me trust what are probably extremely reliable sources far less.
Ustakion@reddit
In here, most people do this now in company email to correct grammar and spelling mistakes as english is not our first language
naggyman@reddit
What was wrong with standard spell check?
TinyCopy5841@reddit
That it only fixes spelling mistakes and doesn't help with grammar or style? Seems pretty obvious.
ballimi@reddit
Standard spell check fixes mistakes in words and grammar, it doesn't clean up your sentences. For example, if I directly translate from Dutch I get this sentence:
Which a standard spell check doesn't flag.
CeleritasLucis@reddit (OP)
ChatGPT also has a very bad confirmation bias. It would pull absolute made up bullshit to support your viewpoint, which you could only detect if you really know about that stuff
EGLLRJTT24@reddit
Yeah that's been an issue in my line of work since day dot, it's not really gotten any better. If anything it's gotten worse with more people barging in and using this shitty tech.
Dear_Smoke6964@reddit
Reddit might hate ai but see what happens when you mistake Their for There.
LawManActual@reddit
I got called a douchebag because I said sometimes my phone legit changes things like their/there/they’re sometimes and it’s not really a big deal.
I guess I’m a lazy douche, haha
SeenSoManyThings@reddit
Self awareness is underrated these days.
1008oh@reddit
I agree, I use AI quite a lot and even in this type of usage it can throw in some false info so I’m a bit suspicious
Pop-metal@reddit
Bottoms line: this 100%
IM_REFUELING@reddit
Airbus bros really will do anything but emphasize basic stick and rudder airmanship when things go south.
ahpc82@reddit
Idk why you are getting downvoted bro lol It’s an entire company hellbent on getting rid of two-pilot cockpits. [1]
Just fly the damned planes already.
[1] https://www.forbes.com/sites/tedreed/2024/06/29/pilot-leader-blasts-airbus-for-backing-single-pilot-flight-deck---its-insane/
Northern_Blights@reddit
Aren't those images at the bottom reversed? Feels like pitch angle is very much not being limited in the right image.
balunstormhands@reddit
Solar radiation is important now because we are in a solar maximum right now. The last one was 11 years ago with a the last big CME (was a near miss) in 2012, that's long enough for companies to forget the lessons learned.
It's like how I lived in a place that was in drought for 20 years and when it rain for a week straight, there were lots of flooding. Nothing had been tested nor cleaned in decades, so didn't work.
MikeInPajamas@reddit
Would PALAL have helped AF447?
Antique_Change2805@reddit
Maybe
rhymeandreasons@reddit
having the "DUAL INPUT" not be inhibited while a "STALL STALL" was going off would have helped 1000%
dnuohxof-2@reddit
So I’m a very layman, and am curious. The issue they’re fixing with this downgrade is the radiation induced bit flip that could cause an unintended nose pitch down scenario.
One of the features of the FW they’re moving from has the PALAL, preventing excessive pitch up.
Coincidence? Most likely, but an interesting coincidence regardless.
CeleritasLucis@reddit (OP)
Source : https://x.com/chainsawrocks/status/1994784339784339555?s=61