I am at my wits end.
We have a small compute server running Debian stable. I think we're at the .37 kernel build last time I checked.
AMD X9900, 192GB RAM, Asus B650m tuff motherboard.
We get persistent crashes around ~once a week to once a fortnight.
Symptoms:
Wtf is going on!? All help appreciated.
We have a small compute server running Debian stable. I think we're at the .37 kernel build last time I checked.
AMD X9900, 192GB RAM, Asus B650m tuff motherboard.
We get persistent crashes around ~once a week to once a fortnight.
Symptoms:
- System is in an on (i.e., lights, fans, etc.) but unresponsive state (won't wake up to keyboard or mouse, doesn't output anything over DP).
- Seems to happen with absolutely no warning
- Journalctl never logs any problems whatsoever. Usually the last log is some standard UFW block message.
- System is generally not under strain when it happens (I confirm with SAR logging - CPU is generally idling along, RAM usage <20%
- There doesn't seem to be a pattern as to the time. Could be overnight, could be middle of the day. Someone might be working on it, may be headless.
- Updated the motherboard firmware
- Changed grub to not allow USB low-power modes
- Removed USB-ethernet dongle and went to PCIe expansion card
- Changed motherboard out to later model (B850m)
- Changed PSU (like-for-like replacement, 850W, should be ample headroom for a GPU-less build)
- Installed 950W UPS
- Ran memtest86+ for 60+ hrs. No errors.
Wtf is going on!? All help appreciated.

