Your system or server has high io, what do you do?
Posted by AgreeableIron811@reddit | sysadmin | View on Reddit | 18 comments
This is one of the problems i get that i hate most. How do i go about this? Yes i know that i can use iostat, iotop top,htop.
I_can_pun_anything@reddit
I call in. An IO-U
saysjuan@reddit
Nothing it’s doing its job. High I/O means it’s busy doing important work. Don’t disturb the hamsters they get pissed off when you stop the wheel from running.
james4765@reddit
I've had a couple of RAID arrays that had terrible IOPS even after it failed over to a hot-spare and rebuilt the array - it took a reboot to get it to go back to reasonable performance.
Some drives, both HDD and SSD, get slower as error rates build up - checking SMART data is also important.
Some applications (Java especially) also behave very badly when hitting the limits of physical memory - the garbage collector can cause wild thrashing with no warning. In house applications can also be doing Very Dumb Things with disk I/O, and that can cause excessive I/O. The iotop command is my goto for figuring out what process is being a resource hog.
aenae@reddit
Use iostat, iotop, top and/or htop.
Look at what runs, why is it doing high io, is it trashing the swap, is it getting a ddos, does it lack other resources.
Dive deeper, strace processes, use ebpf tools to track writes etc.
There is no singular answer.
2FalseSteps@reddit
"What??? Ain't nobody got time to properly troubleshoot anything. Just reboot it!" - The "devs" I work with that think they know everything.
NiiWiiCamo@reddit
High or too high?
Temps?
IO delay? Check swap, look for misbehaving processes.
RAM util? Increase ram and possibly disable swap.
Networking? Check the traffic.
Storage? More storage, all NVMe, get any disk below 80% capacity utilization.
Cloud? Any knows issues or the wrong instance for your load?
Physical? Any recent changes or just steady growth? Scale out if necessary.
Is the load constant or only when something happens?
Still too high IO delay? Rethink my deployment.
AgreeableIron811@reddit (OP)
Lets say it is physcial linux server. I ssh to the machine. Run top -> 80% w/a Run iostat -> 100% util Then iotop -> to find some things that take up much cpu. What is the step between this and killing the process?
obviousboy@reddit
From Gemini - pretty decent run down of what I’ve done in the past.
AgreeableIron811@reddit (OP)
I wish i could upvote this 100 times
ProfessorWorried626@reddit
Depends on the setup. If you’re running a aws instance vs a physical server the answers are going to be very different.
AgreeableIron811@reddit (OP)
Physical server in this case i have a nas server with high io. But there is different scenarios. I am just interested on how you strategise as more experienced than me.
jaydizzleforshizzle@reddit
Are you doing block transfers, file system transfers, network shares what? Commercial nas tend to have shit cpu, because the intention is to use FC to just do block transfers and not have to deal with the cpu.
Tech_Mix_Guru111@reddit
I’m d say an issue related to block size, mtu and could be the type of data being transmitted
Tech_Mix_Guru111@reddit
I’m d say an issue related to block size, mtu and could be the type of data being transmitted
running101@reddit
Depends on what system it is? If it is a database, see if there are indexes that need to be added. Application server? could be a lot of different things, increase ram so it caches more. Or rearchitect system with caching layer
ConfectionCommon3518@reddit
Memory being tight and thus a lot of riro action (roll in and roll out) aka virtual memory, I always think of it as that as I cut my teeth on icl mainframes and it was a term they used.
But it's time to dig out the basic things starting with is the disk array in disarray aka dead drives? Crappy software being pushed out hammering the system due to Devs being idle etc....
ConstructionSafe2814@reddit
It might not add much but
nmon
is also a nice utility. I like pressingl
for long term CPU statistics. My bet is that you'll see blue wait states in the CPU. If you pressc
you might see which core/CPU the wait states are on.As I said, if you use iostat, iotop and all, it might not add much, but it might display data a bit differently than you're used to.
Exzellius2@reddit
Scale Up?