processes stuck in nanosleep() with rt kernel

j55823

New Member
Joined
Mar 16, 2023
Messages
2
Reaction score
1
Credits
30
Hey all. I have an embedded board with a quad core A53 ARM that's running kernel 5.4.74. We build the kernel our self, and we have it patched with the latest 5.4.74 rt42 patches from here (https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/5.4/older/).

We have an instability in our system. If left on long enough (30+mins), processes that are blocking on the nanosleep() system call will never be woken up. It's not just our own user apps that get stuck either; it's any process in the OS that calls the nanosleep(). The "ping" program will get stuck in a sleep, even the bash "sleep" command will get stuck.

We've noticed that the kernel timer_list on CPU 0 will accumulate negative timer entries while the system is in this state. As far as I can tell, the negative timer entries mean that those entries are long-past expired. It's almost like the timer queues are not getting serviced quick enough or at all potentially.
1679001953237.png


If we fall back to the normal 5.4.74 kernel (ie. don't apply the rt patch), then this issue disappears all together. Wondering where else we can look for a potential solution. Maybe some next steps to debug this instability. Thanks!
 
Last edited:


even the bash "sleep" command will get stuck.

So, sleep really sleeps.

I read your post and I haven't got a clue how to even start debugging this. I'm just a lowly janitor anyhow.

As a custodian, I wonder if this might be better in the Single Board Computer section? We aim for the *most appropriate* sub-forum, which can be confusing as heck at times. This just seems more complex/specific than 'general', so if you can think of a better sub-forum let me know and I'll move it there. As it's an embedded system, I'd think SBC would be about right.
 
Thanks. I've edited the title and original post to be a little more clear. It's the nanosleep() system call that processes are getting stuck in. We've found this by running `strace` on processes and found nanosleep() is the last system call they make before getting indefinitely blocked.

As for which sub-forum this should be in, I'm not sure. While we are running linux on an SBC, I suspect the issue isn't unique to it being an SBC. That being said, perhaps it is best to put it there if that community of folks are more geared toward linux kernel customizations like this.
 
That being said, perhaps it is best to put it there if that community of folks are more geared toward linux kernel customizations like this.

There are folks who do check certain forums first/more, so consider it done.

I wish I had your answers, but I do not.
 

Members online


Latest posts

Top