Airbus issues major A320 recall after flight-control incident
Posted by MrAeronaut@reddit | aviation | View on Reddit | 104 comments
Posted by MrAeronaut@reddit | aviation | View on Reddit | 104 comments
Kanyiko@reddit
Airbus today: "We had this issue on one aircraft under a particularly rare set of circumstances so as a manufacturer we find ourselves obliged to ask all of our customers to ground their aircraft until we have had a time to implement a fix on this."
Boeing in 2019: "We're pretty sure it's completely a coincidence those two smoking holes in the ground were both caused by Boeing 737-MAX aircraft, and we're pretty sure both are pilot error so we don't see any reason why we should ground these aircraft. Even the FAA says so."
yourlocalFSDO@reddit
Airbus and EASA have certainly known about this for weeks. It’s not a coincidence that the emergency AD wasn’t issued until they already had a fix for it
evac95@reddit
The fix is reverting back to the previous software version, so it’s not like weeks would’ve been spent developing a new software to fix it. Likely that a bug was introduced in the latest software version and it was just a case of identifying it.
yourlocalFSDO@reddit
The question is why they waited until today to ground the fleet when they’ve know the issue for over 3 weeks. If it was worthy of grounding they should’ve done so when they’ve found the problem
MrDannyProvolone@reddit
"intense solar radiation may corrupt data critical to the functioning of flight controls"
I've never heard of anything like this before, outside of spacecraft. Does anyone know of any other incidents with a cause even remotely close to this?
cas4076@reddit
We've been going through a much increased period of radiation and given aircraft operate at altitude, this does not surprise me. I'm guessing the fox is first a software one whee they add error checking and the longer term is more shielding.
This may be a new risk to deal with given the increase in solar activity and ni doubt we will see more impacts, not just Airbus and not just commercial aircraft.
BoringBob84@reddit
Avionics manufacturers are not going to encase their equipment in lead. The solution is mitigation in hardware and software.
N43N@reddit
Shielding is one way to mitigate this via hardware, yes.
BoringBob84@reddit
OK, then encase it in steel or a water bath that is 10 cm thick. That will stop stray neutrons. /sarcasm
N43N@reddit
Depending on what signals we are talking about, a little bit of aluminum foil at the right place could already be enough.
BoringBob84@reddit
Stray neutrons will penetrate that easily.
FAA - Single Event Effects Mitigation Techniques Report
raptor217@reddit
Neutrons don’t really interact with electronics. They only seem to degrade optoisolators (unless it’s a nuclear blast but that’s a different ballgame).
BoringBob84@reddit
Read the report.
raptor217@reddit
Aluminum foil does nothing to protons/heavy ions.
Due to the Bragg peak, electronics are only affected in a narrow window. If the energy of a particle is too high, it punches right through without really depositing much energy.
You need the shield to actually stop particles, slowing them down can actually make it worse. Shielding (much thicker than foil) will reduce the flux of particles that can cause an upset, but never eliminate it (without feet of material).
myusernameblabla@reddit
Yeah, I doubt satellites are shielded with lead.
raptor217@reddit
You actually cannot totally shield (without feet of concrete). If you have aluminum (or anything really), the most common shielding metal, a proton hitting it can induce a new particle which travels further.
You end up with a particle that would travel all the way through and not flip a bit (because it cannot deposit energy) inducing a particle that can.
Blazah@reddit
I knew this weird aurora activity down to Florida would come with issues!!
Now what happens when we suddenly have cancer in 12 months, everyone outside taking pictures of it, I'm at home in my basement under my solar blanket !
RickMuffy@reddit
I studied this when I got my Aerospace degree. It's tint particles that literally flip a computer bit from a 0 to a 1 or vice versa and screw up the computer program. I'm guessing that's what happened here.
Perfect-Ad-1774@reddit
Didnt know that, thanks for the knowledge sir. 🧐
whiteridge@reddit
If you enjoyed that, you’ll like this:
How An Ionizing Particle From Outer Space Helped A Mario Speedrunner Save Time
https://www.thegamer.com/how-ionizing-particle-outer-space-helped-super-mario-64-speedrunner-save-time/
I_DRINK_URINE@reddit
That's unfortunately false. That's been proven to be caused by a bug in the game, nothing to do with space particles.
whiteridge@reddit
Interesting! Where did you see that?
I_DRINK_URINE@reddit
https://youtu.be/vj8DzA9y8ls?si=7ZC0U8fhh7C-Zbq4
whiteridge@reddit
Interesting video. Your claim that “It's been proven to be caused by a bug in the game” doesn’t match what’s said in the video though.
The video says it’s never been proven what caused the bit flip, and whether it was even a bit flip. The cosmic ray theory is just the most fun theory.
peepay@reddit
And likely affected the outcome of Swiss elections too.
(The result was changed exactly by 4096 votes.)
oh_dear_now_what@reddit
You're thinking of a 2003 incident in Belgium. https://en.wikipedia.org/wiki/Electronic_voting_in_Belgium
The Swiss had a vote totalling error more recently, in 2023, but it was probably a more conventional software bug. https://www.swissinfo.ch/eng/business/swiss-election-results-revised-after-vote-counting-error/48925322
peepay@reddit
Ah, thanks
A-Delonix-Regia@reddit
Belgian, but your point still stands.
Tiny-Composer-6641@reddit
The 'its only a typo' crowd don't see what the big deal is here
peroxidase2@reddit
So more speed tape for shielding?
ZaryaBubbler@reddit
We're in a solar high at the moment, have been for a while, they last about 7ish years so we could see more of this
Unbaguettable@reddit
The increase in the impacts (such as the new Glenn scrub) is just due to us being at the peak of the Suns solar cycle. In a couple years we should have a lot less as the sun will be calmer.
BoringBob84@reddit
Regulations require manufacturers to design electronics to be robust against "Single Event Upset."
The A320 does not currently comply - thus, the EAD.
Hiddencamper@reddit
I work in power generation and we will get geomagnetic storm warnings and I have seen grid perturbations, line trips. Infrequently, but it happens. Usually it’s just random stuff but it’s clear the gremlins come out during the major solar storms.
Imtherealwaffle@reddit
damn airbus having mario bitflip problems
ArsErratia@reddit
Is there an actual direct source for this claim, or is a possible cause that's being reported unclearly by the journalists?
There are many causes of Single-Event upsets other than space radiation. And the chance of it happening multiple times to multiple planes in the exact same manner is questionable.
raptor217@reddit
Agreed 100%. Unless they have radiation test data, this is often something that requires you to eliminate every other possibility first.
DickBatman@reddit
Yeah there was a Mario 64 speedrunning incident like this once. Well, a flipped bit from solar radiation is one of the speculated causes
spsteve@reddit
This might explain a number of uncontrolled atitude changes that have occurred over years with little to no explanation.
Ungrammaticus@reddit
It's not impossible (nothing is impossible) but SEEs are very rare, even in aircraft. And SEEs that hit the same data with the same effect is exceedingly rare.
The probability of this affecting many other flights is low.
Just_Another_Scott@reddit
Intense solar storms have been known to take down the electrical grid and affect ground based electronics. One of the largest power outages in North America was caused by a geomagnetic storm.
https://www.astronomy.com/science/a-large-solar-storm-could-knock-out-the-internet-and-power-grid-an-electrical-engineer-explains-how/
raptor217@reddit
While these storms often coincide with higher fluxes of protons and heavy ions, they do totally different things. A geomagnetic storm will not affect an airplane, aside from compass readings. They affect conductive surfaces that are miles long.
But solar flares (which often cause these storms) can affect planes, but not the energy grid.
raptor217@reddit
Oh hey, actually close to my area of expertise. The rates in atmosphere are very low, but not zero. They also aren’t zero on the ground (but they fall below things like failure rate of electronics).
Redundancy should have caught this. With things like redundant flight computers/sensors, it’s categorically impossible have an upset in multiple physical units, in atmosphere.
I wonder if they experienced corruption of their flight code storage memory, but that should be implemented with error correcting code and checksums.
BoringBob84@reddit
Wikipedia - "Single Event Upset / Error"
ModsHaveHUGEcocks@reddit
Happened on a Qantas A330 years ago I believe
Beneficial_Dish8637@reddit
Here’s an interesting video on this. It did happen before in 2008.
https://youtu.be/AaZ_RSt0KP8?si=IuNpauWMlmTtVby0
barbarossapl@reddit
On a United A321neo that is delayed for computer program software update/check and pilot referenced maintenance received the alert about 30 mins ago to check all affected aircraft
MoiraRose2021@reddit
I’m scheduled to fly out on an A321neo tomorrow at 6:30 am. Wonder how that will work out….
Blazah@reddit
how'd it go?
MoiraRose2021@reddit
No issues at all! Smooth sailing.
barbarossapl@reddit
Deplaned now. Sounds like maintenance and engineering doesn’t actually know how to complete the check/upgrade/downgrade and are figuring it out on the fly
Efficient_Sky5173@reddit
On the fly?!? It should be done on the ground!
weristjonsnow@reddit
Oh Jesus, well this sounds like a shitshow
barbarossapl@reddit
Pushed to 11:30p ET (original 4:45p departure)
asclepi@reddit
That's surprising. Airbus has included a detailed walkthrough on the procedure in the AOT they published.
N651EB@reddit
I’m scheduled to fly on a UA A321neo on Sunday. I’ll be very curious if United can address this and get you guys in the air tonight or if this is going to be a much longer downtime proposition for the fleet. Good luck, and keep us posted here if you would!
Frosty1887@reddit
Just got off a neo 3 minutes ago, no delay on ours via united!
Commercial-Run-3737@reddit
From EAD Issued by Airbus:
An Airbus A320 aeroplane recently experienced an uncommanded and limited pitch down event. The autopilot remained engaged throughout the event, with a brief and limited loss of altitude, and the rest of the flight was uneventful. Preliminary technical assessment done by Airbus identified a malfunction of the affected ELAC as possible contributing factor. This condition, if not corrected, could lead in the worst-case scenario to an uncommanded elevator movement that may result in exceeding the aircraft’s structural capability.
BoringBob84@reddit
EASA issues EADs, not manufacturers.
TacticalSniper@reddit
I was all fine with this until
ArsErratia@reddit
i mean isn't this technically true of any uncommanded elevator movement? They say in the text its a worst-case scenario.
BoringBob84@reddit
This is similar to 737 MCAS. All the crew has to do is to turn it off.
someFAsarecrazy@reddit
In the A320 there’s envelope protections independent of the AP. If the ELAC is confused, or malfunctioning, and thinks the pitch is -35 and it actually is 5 deg, it will try to correct it, AP on or off.
You can turn the ELAC’s off manually of course but what should happen is they should detect a problem and turn off on their own, which is what you want.
CashKeyboard@reddit
Going by my basic understanding of the system, the FMGC/FMGS and thus the autopilot would not be able to override basic flight envelope protections. The ELAC further down the chain would enforce those and change the law if it detected an automation failure. This is in line with the given explanation of this incident with the fault lying within the ELAC.
TacticalSniper@reddit
In my mind this depends on the circumstance. Agreed it's most likely a very fringe case, but it also tells me there is a miniscule chance of potentially the change being so sudden it would exceed what the airframe can bare, where potentially the pilots would be incapable of a response.
What I was essentially saying is that this looked like a small enough issue where a slow rollout of a fix would be acceptable to me, but once I've seen this last bit this became quite urgent in my view and justifies grounding the fleet.
ArsErratia@reddit
I acknowledge I'm reading between the lines here, but I interpreted it as a break-up as a result of the subsequent overspeed, not a direct overstress from sudden control inputs.
Surely its incredibly unlikely that even extreme movements of the elevator can cause an in-flight break up, especially within the autopilot authority limits? Going even from level-flight to full-nose-down in one go seems like the kind of thing the airframe would be explicitly designed to manage (either by increasing the structural strength or limiting the travel of the elevator mechanism). If anything it would make a sensible baseline design-stress you'd start all your engineering calculations from? Which then gets a safety factor on top.
There's a huge difference between this and side-loading the vertical stabiliser á-la AA 587.
Spiderspook@reddit
I thought that error correcting memory is supposed to deal with situations like high energy particles entering into our atmosphere and flipping bits. Does airbus not use ecc memory or am I misunderstanding something?
SirEDCaLot@reddit
IT person here.
ECC memory is good, but cosmic rays / solar radiation can affect systems in other ways, like data in the CPU or going to/from memory.
Airbus designs their system with TONS of redundancy- there's 3 flight control computers, running 3 different software programs (all of which are programmed to do the same thing, but in different ways), and if 2 of 3 agree on an answer but 1 disagrees the 1 is deemed unreliable and cut out of the loop.
Most likely the issue is in some esoteric place where that redundancy is done, or perhaps in an error checking routine for some piece of data. For example if the new version doesn't properly error-check a piece of data before running a computation, that could cause an issue.
Pop-metal@reddit
Wrong. The shuttle had 3 flight controls computers.
Airbus planes can have as many as 7.
raptor217@reddit
ECC does handle this for external ram. It doesn’t stop a Single Event Latchup (SEL) which would be insanely rare (and not worth grounding a fleet over). It also doesn’t stop internal CPU register bits from flipping nor NAND file storage.
CosmicRayException@reddit
Someone forgot a try-catch.
nalc@reddit
Similar issue to Qantas 72?
Hidden_Bomb@reddit
This was exactly the incident I thought of. Though that was likely caused by a cosmic ray causing a particle shower.
raptor217@reddit
Cosmic ray is another name for heavy ion, it’s the same exact thing as a solar radiation event. (The physics gets a bit weird, but loosely in atmosphere you can consider them the same thing)
Sempervirens47@reddit
Solar radiation is less energetic and will not produce secondary, tertiary, etc. radiation by spallation when it makes contact with the atmosphere. Hence, "particle shower." Similar but not identical!
raptor217@reddit
That’s actually incorrect. Solar radiation events can produce both high energy protons and heavy ions. The only thing it cannot do is produce heavy ions above the “iron knee”. But the protons are more than capable of generating those through secondary particles.
swordfi2@reddit
Not sure why you got downvoted but based on the report they appear very similar
nalc@reddit
Yeah idk, both it and the original October incident seem to be SEEs causing uncommanded pitching. Kinda weird that a software rollback is fixing it because that would imply that they got rid of some sort of CRC or voter between the old version and the new version which would be a weird choice.
Ungrammaticus@reddit
They don’t have to have chosen to get rid of it, they may just have accidentally borked it up with the update.
ArsErratia@reddit
Potentially couldn't it be that it wasn't there in the first place, but the bug was prevented by other behaviour elsewhere in the code?
BoringBob84@reddit
Not only that, but they failed to test the updated version adequately. Unless the software has strict partitioning, any update to the software requires re-testing every part of the software.
CarbonKevinYWG@reddit
I don't think you realize how insane that statement is.
Testing the effect of a single bit flip in literally every bit of memory in every possible flight condition is physically impossible. That requirement would mean no software updates, ever.
BoringBob84@reddit
It sounds to me like you do not understand how software is developed in the aerospace industry and you are insulting me personally to distract from that fact.
You definitely do not understand how software is developed in the aerospace industry.
Study RTCA DO-178 and then let's have this discussion.
Chen932000@reddit
You'd do regression testing, but you wouldnt have to literally retest everything even for DAL A software. Depending on what the actual error was and what kind of SEE occurred, its possible even your DO-178C robustness testing wouldn't catch the problem. And your functional/system level testing almost certainly doesn't take SEE into account, probably taking credit for some other analysis stating that SEE wasnt of concern for X or Y reason. Where the actual failure in the design chain is, is not yet clear since we don't have the details of what the actual failure mode is.
raptor217@reddit
Literally the only way to test true SEE effects in a comprehensive manner is to have code, in a test build, that injects random errors into memory to test how it handles it.
You’d never FLY with code that can “inject possibly dangerous error” so your binary level code is changed anyways.
There’s a massive number of analysis for safety critical code that can be done. Static analysis, stack canaries, compile time checks such that you don’t need to worry about flight proven code, just the code that has changed.
Chen932000@reddit
Exactly. Now I am surprised that a flight critical system like this is suceptible to SEE but since the apparent software fix is reverting to a previous version it’s possible they introduced a software bug in the newer versions that broke some of the SEE robustness they did have already in the code. That’s the only way I can see why reverting software would “fix” this issue. Or i guess if they somehow had new software going through a different processor compared to the previous version….but that would be a HUGE change.
raptor217@reddit
I wonder if their control algorithm for external sensors has a corner case bug that only happens from SEE. You have a hard time breaking OS level SECDED with a new build but algorithm changes can always be susceptible to bad/stale data.
BoringBob84@reddit
Thank you for the additional context.
raptor217@reddit
Yeah they don’t know what they’re talking about. You test modularly and do regression tests.
CarrowCanary@reddit
6,000 aircraft affected, but 5,100 of them only need a quick software update and shouldn't be out of action for more than a few hours. The other 900 need physical components to be replaced, so they'll be grounded for pax flights for a bit longer.
https://www.bbc.co.uk/news/articles/c8e9d13x2z7o
versus1309@reddit
Is the A320 NEO impacted?
MrAeronaut@reddit (OP)
Mostly the NEOs, as they are the ones with the latest software. Apparently it is a two hour software load to change the software back to the previous version though, so this is easily manageable overnight at an airline’s engineering base
Neurock97@reddit
Yeah I have the same ques
NewHope13@reddit
Anyone flying Spirit today? I’m set to fly on a 321neo tomorrow and wondering how spirit is handling these software updates
Chumpback@reddit
Was mid-flight from Cancun to Charlotte when it came down. Have now been delayed twice for the flight home. Just waiting for the cancellation at this point
asclepi@reddit
I understand it's Friday night - the one before the busy Thanksgiving return nevertheless - but so far there seems to be minimal urgency or concern. Is the impact of this going to be much less than it seems?
I have a 321 flight on Monday, not sure what to expect. I purchased a refundable ticket on an alternative A350 flight just to be sure.
Inevitable_Train1511@reddit
I think you’re good to cancel the A350 flight you should be set by Monday. Safe travels
imapilotaz@reddit
By Monday? Nothing burger except potential passenger disruptions waiting on standby or rebooked
theonion513@reddit
How could Boeing let something like this happen?
mixxituk@reddit
Well at l say it's not Christmas
lekker-boterham@reddit
My flight from queenstown to auckland this morning was just canceled due to this
thenoobtanker@reddit
Wait what? Ain’t literally a good chunk of plane flying and carrying passenger are A320? Like over half? And this is the busiest travel time of the year as well? Lunar new year doesn’t count because most of the travel is done by train in China but still. Couldn’t be at a worse time. Well I mean late next month might be worse but this, especially for the US with the double whammy of government shutdown and restart and now this.
I see many people graying out a lot over this.
LMB_mook@reddit
Apparently it's a 3 hour fix, so seems unlikely to be a major disruption.
TareasS@reddit
One instance that this was an issue in so many years with tens of thousands of planes sold.
I think you're being a bit dramatic.