Viability of forensic analysis of XFS journal
Posted by ccie6861@reddit | linuxadmin | View on Reddit | 10 comments
Forgive the potential stupidity of this question. I know enough to ask these questions but not enough to know how or if I can take it further. Hence the post.
I am working on a business critical system that handles both medical and payment data (translation: both HIPPA and PCI regulated).
Last week a vendor made changes to the system that resulted in extended down time. I've been asked to provide as much empirical forensic evidence as I can to demonstrate who and when it happened. I have a general window that I can constrain the investigation to about a two hours about four days ago.
Several key files were touched. I know the names of the files, but since they've been repaired, I no longer have a record of who or when they were previously touched in the active file system. There is no backup or snapshot (its a VM) that would give me enough specificity of who or when to be useful.
The fundamental question is: Does XFS retain enough journal logs and enough data in those logs for me to determine exactly when it was touched and by who? If not on the live system, could it be cloned and rolled back?
deeseearr@reddit
I can't help much with this but if you're running critical systems in a highly regulated environment then you should have auditing enabled and now you know why. If you were compliant with PCI DSS, specifically section 10.7, then I don't know why you wouldn't be doing this already.
Enable the built-in auditing system at the kernel level, configure auditd.conf to do detailed tracking of all accesses to every configuration file, system log, service, or just about anything else that could conceivably be of interest. If you had done that then you could pull up a detailed trace of exactly what was done, by whom, and when with a few simple commands. You can easily send the audit logs to a remote server for storage so there's no "But we can't afford storage' excuse.
As for XFS, the filesystem logs are only meant to preserve the integrity of the filesystem itself. They're not meant to be forensic so I would be surprised if any of the logs of changes were retained for very long after being committed. If you had shut the system down and imaged the disk immediately, you might be able to pull something out of it, but if it was allowed to keep running then anything of interest is almost certainly gone.
ccie6861@reddit (OP)
Thank you. Yes, sadly I am not the sysadmin for this box nor personally responsible for the security. If I was, at least the credentials would have been more secure. auditd is running and it is auditing changes to the directory that hosts the modified files. Unfortunately they have log rotated out already. I'm working with the system owner to get the previous versions. Thank you for the suggestions!
SneakyPhil@reddit
dd the disk and make extra copies first
ccie6861@reddit (OP)
We know it was the vendor, but the vendor used a shared account with real root access (su, not sudo) and they OpenVPN in for remote service, so the login will tell me who was on in general terms, but probably not who edited the file or when. For such a sensitive system, it has shit controls.
I'll take a look at auditd. I have not done that.
SneakyPhil@reddit
How did the system crash? Are you sure it wasn't load related or just shit software?
ccie6861@reddit (OP)
The system didn't crash. The person edited the network configuration such that they cut the legs out from the under the stool and then didn't communicate it. I'm actually the network admin and I was told several hours later after someone squelched the monitoring for the system and asked me if there had been any known network outages. The vendor refused to fix it because it was "a local network problem". They never owned up to the changes.
SneakyPhil@reddit
You may have network interface logs in dmesg if the system hasnt been rebooted. However seeing that you do hipaa, you should have centralized monitoring that stored all these logs on a separate server.
ccie6861@reddit (OP)
I didnt look, but my assumption was that Splunk wouldnt show much because it also lost access. However, in thinking it through, your suggestion is a good one. There would have had to have been atleast SOME delay between changes and failure. Thank you for the idea.
SneakyPhil@reddit
Splunk would have logged everything up to the moment that the network died. Start there and work back say 30 minutes to see what you see there.
xxxsirkillalot@reddit
The reeks of the hospital envs i've worked with in my past lol. I learned really quick working for MSPs that before doing changes on customer systems, to reboot stuff before I make any of my changes as to uncover things like this. Sometimes the only defense you have is "I literally made no change, I rebooted it before I started my work to see if it would come up properly".
If you can't find anything in your hunt, do a backup restore of that VM to
$day_outage_occurred - 1
and see if the config file is already f'd up and waiting on a daemon reboot to bomb stuff.... surely you have backups of this critical system