I built a TSA tool for Linux to find the "hidden" CPU wait time
Posted by AnkurR7@reddit | linuxadmin | View on Reddit | 17 comments
standard tools like htop usually just show cpu % but i needed to know why threads were stalling when they WERENT using cpu. found a footnote in brendan greggs systems performance book saying a native linux tsa tool was missing, so i tried to build one in rust.
it uses raw netlink taskstats to get microsecond-precision delay accounting. it shows exec % vs sched wait % vs disk io %. i had some trouble with kernel caching in newer versions (5.15+) but it works well for active threads.
check it out if you're debugging noisy neighbors or disk latency
issues:
https://github.com/AnkurRathore/tsastat
The_Real_Grand_Nagus@reddit
Oh did you now?
nawcom@reddit
From the code itself to the readme text, it's very obviously vibecoded.
slippery@reddit
Vibe coded is coded. It's just the next layer up in abstraction.
You wouldn't worry about a program that wasn't written directly in machine code. Or that wasn't written in assembler. Or hey, that wasn't written in C. Most programs from now will be written in English (or another natural language) then converted to machine code at either compile or run time.
billdietrich1@reddit
Does it work ? Is it useful ? If so, why should I care how it was built ?
AnkurR7@reddit (OP)
Why don't you try it out
billdietrich1@reddit
CPU wait time is not an issue I care about. But I have no problem about vibe-coded or not.
mk_gecko@reddit
I'm shocked by the nasty toxic comments in reply to you.
Well done on taking initiative and doing this. Even if it doesn't meet others' needs I'm sure that you learned a lot.
whamra@reddit
In my experience what we actually need is to FIND the thread and process that are causing such issues. I can't think of any use case for monitoring a single thread and watching those numbers. What knowledge do I gain? How will it help me diagnose anything?
AnkurR7@reddit (OP)
Usually, you would use top or htop to find the "who" (the culprit process). tsastat is for the next step: finding the "why."
If a thread is slow but CPU usage looks low, you're usually guessing. Is it waiting for a turn on the CPU (scheduler latency)? Is it blocked on a disk read? Or is it thrashing swap?
High CPU WAIT tells you the system is saturated and you need more cores or better pinning. High I/O WAIT tells you to go look at the storage backend. It’s about narrowing down the search space before you break out the heavy tracers like perf or strace.
Also, it actually pulls all threads for the PID you target, so it helps identify which specific worker thread in a pool is the one that's actually stalled.
whamra@reddit
No, see, that's the thing about this issue. The who is not using a lot of cpu. Looks pretty much like everyone else. But his threads are constantly waiting for things, sometimes disrupting the entire flow of the system.
Your tool can probably confirm to me my suspicious of a process, but not really help find it. If I do know which one to look at, I'd probably just kill it and see what happens :D
AnkurR7@reddit (OP)
That is a great point, and it’s the classic 'Silent Killer' scenario. A process stuck in D-state (uninterruptible sleep) or holding a kernel mutex while waiting for I/O won't ever show up at the top of htop, but it can hang the whole system.
You have actually given me a great idea for the next feature: a System-Wide Mode. Instead of targeting one PID, the tool would poll all active processes and sort them by the highest WAIT states rather than CPU usage. That would basically turn tsastat into a 'searchlight' for those silent processes that are disrupting the flow without burning cycles. Thanks for the push—I am adding 'Global Sorting by Wait State' to the roadmap.
chocopudding17@reddit
Get this slop outta here
AnkurR7@reddit (OP)
Really. Thanks
chocopudding17@reddit
If you're actually a real human being, then don't publish code and posts that come from an LLM. Simple as that. If you are actually a systems person, then let that shine through! Don't cloud it with AI bullshit; write your own code, and write your own posts. If you don't have time to write it, I've not got time to read it (let alone run some crappy code).
AnkurR7@reddit (OP)
There was some research that went into developing it and the code was written by me but I will not waste my time proving it. If you do not want to read or use it, the choice is yours. Thanks
Theratchetnclank@reddit
Lol you even ai generated your reply because you don't know what you are talking about.
If people wanted to engage with a bot I'm sure they would, no need to post ai generated responses on Reddit.
AnkurR7@reddit (OP)
I am not a bot engaging. Just a person who was reading Brenden Gregg's system performance book and thought of developing this tool in Rust just to understand and learn