How to identify the process causing elevated I/O wait ?

Usjes

New Member
Joined
Dec 30, 2018
Messages
11
Reaction score
0
Credits
222
Hi,

I'm sporadically seeing my CPU being taken over by elevated i/o wait time eg.:
Screenshot from 2024-05-25 16-37-07.png

Here its at 35% but it rapidly climbs to ~100% and then the machine become un-usable and has to be power-cycled. So I'm wondering if there is any command I can run that will tell me which process is responsible for this behaviour ? The machine has a hard disk and when this problem occurs the disk access LED goes crazy so it is almost certainly some process reading or writing to disk but is there any way to determine which one ?

Thanks,

Usjes.
 

Attachments

  • Screenshot from 2024-05-25 16-37-07.png
    Screenshot from 2024-05-25 16-37-07.png
    31.7 KB · Views: 35


What does top say in its listing of processes that is so busy?
Use the command:
Code:
 ps -w <PID>
to get a full path of the process with the process number <PID>. It outputs the command that created the process with all its arguments.

I guess it's the CPU and MEM columns that are the ones of greatest interest if the machine is slowing right down and locking up.

With htop, there's more interaction possible than with top.
 
What does top say in its listing of processes that is so busy?
Well that's the problem, it doesn't indicate that anything is particularly busy which is why I am looking for a command that will tell me what specifically is causing the high CPU % wa when it occurs:
Screenshot from 2024-05-26 15-09-26.png

So here CPU usage has hit 99% and the system is unusable, my only option is to power-cycle. Yet the most CPU intensive process listed by top is using just 5.9% of the CPU. Clearly the CPU usage in the %CPU column for all processes is not consistent with the %Cpu(s) value listed in the summary. The fact that the disk access led is going crazy indicates that the %Cpu(s) value is correct but I can't find out what specifically is causing it. Does anyone know how I can ? Or indeed a way that I can set a limit so that no process can use more than eg 75% of the CPU such that when it occurs the entire system won't be frozen and I could open a terminal to try to kill the processes one-by-one to identify the culprit ?
 
Perhaps have a look here:

There are a some proposals to fix i/o wait times on a machine given near the end of the article.

Note that the iostat command used in that article is a utility provided by the sysstat package which would need to be installed.

In addition to the suggestions provided there for useful programs to identify cpu and i/o usage, it may also be useful to use the sar command from the sysstat utilities. For example, the following command will display real time CPU usage every 1 second for 3 times for ALL cpu cores on the machine:
Code:
sar -P ALL 1 3
Other monitoring programs may only provide an average of all the cores.

If the sar command doesn't run at first, then the sysstat service can be started thus:
Code:
systemctl start sysstat.service
 

Staff online


Top