How to troubleshoot why the system went down?

oslon

Member
Joined
Oct 15, 2023
Messages
40
Reaction score
6
Credits
424
This is not a specific question as per se. Please bear with me. I've consulted chatgpt as well and google as well, I'll put down my points here.
A server
  • linux
  • running k3s,docker.
crashed.
Restarting it fixed the issue. :)
Now, how do I troubleshoot why it happened?
Is there anything I can do post-incident?
If not, is there something I can do to debug the next incident?
What I think I can do?
I can install sar and next monitor for unusualities.
Is there anything I am missing.
The application logs get stopped at the moment the server crashes, So I don't think I can take a look at it. My only hope is the logs in rancher itself. But let's see.
Please provide me guidance.
 


Usually, it's a good idea to run

dmesg

Sometimes you can

less /var/log/messages

( depending on which distro you're running )

If you think it's docker or kube related, ...

systemctl status -l docker

systemctl status -l kubectl


I don't run docker anymore, I use podman, but I think the docker logs
might be at /var/log/docker

( again it depends on which distro you use ).
 

Members online


Latest posts

Top