Is it possible to simulate ECC mode with non ECC RAM?
Posted by 5TR4TR3X@reddit | linuxadmin | View on Reddit | 5 comments
I am curious if it is possible to instruct Linux with a kernel module or something similar to simulate ECC with non ECC memory. I am thinking about a way that leverages the error correction to the CPU through the OS. So the error correction bits could be stored in regular memory.
I understand it may put an extra load onto the CPU, but I am not sure how heavy that extra load could be.
Is there a module or tool for Linux that achieves this procedure I am visioning?
BradChesney79@reddit
I was just thinking similarly to OP and maybe resurrecting this post unnecessarily.
Have most of the RAM work as normal-- but, at the lowest level possible, allocate a chunk of reserved space for the check bits-- and an understood performance hit which seems unanimous in all the posts. Completely logical that it is going to gum up the works and make everything slower.
Saw a comment about being able to make the CPU use L3 and skip using L1 and L2 RAM-- probably every bit as good as dog water, but meet the requirements of the idea.
ryao@reddit
There is no tool, but it is theoretically possible to make one. It would be a hypervisor that puts the CPU into a little known mode where it operates using the L3 as RAM. The performance would be fairly bad, since it would only have a tiny amount of RAM, so it would be constantly swapping in pages and calculating software ECC. It would also reduce the amount of RAM that you have to store the ECC.
BradChesney79@reddit
Learn something every day.
msanangelo@reddit
no cause the ECC is done in ram with a dedicated chip. the OS has no concept of it.
greysourcecode@reddit
In this thread late, but there is "in-band ECC". You need a motherboard and CPU that supports it (only 13th gen Intel as far as I know, and not many MB support it). It's basicly a modification to the CPU's integrated MMU. The MMU sets aside parts of memory for parity but you take a pretty large hit to total RAM GB (I think I saw an example of 32GB going down to 26GB). I'm not sure if the parity calculations are done in the MMU or somewhere else, but I'd imagine you'd take a hit to MT/S.
While it might "technically be possible" to create some sort of fake ECC it'd not be efficient. The RAM interfaces directly with the CPU, so you'd need to run some sort of checksum at the kernel level which would slow down every operation, but then the CPU would spend almost 60% of it's time running checksums rather than your program (then your checksum could get a bit flipped); and even then the kernel wouldn't be protected. You could also run three of the same process in parallel. You'd have to write a custom kernel (which would still not be protected), then split your available cores and ram into three sections, then run your processes on all three similuatiusly, then have the scheduler compute a checksum after process interrupt (basicly high availability). You need three since you'd need a tie braker. This way you're at least running at your CPU's full clock speed, but you'd only have 1/3 the available cores and RAM and to be honest can't even imagine what a headache programing that kernel schedualer would be.
You could maybe use something like ZRAM to allocate block storage, then use it as SWAP. You'd need to modify the kernel modules responsible for SWAP and/or ZRAM to include some sort of ECC like ZFS (though ZFS mostly focuses on object storage). You'd take a good performance hit, but it's a better solution than the rest. As always, only your users space applications would be protected, which sucks because all transfers to disk go through the kernel.
It's a fun thought experiment and a cool idea to explore, but not even close to being as fast, efficient, or safe as true ECC.