Debian_SuperUser
Active Member
So I have a laptop who's processor's one core seems to be degraded (this is a result of prolonged troubleshooting and not a random guess) and can't run at max frequency, only if I could give it more voltage but the damn firmware restricts access to the voltage regulator. I can run it with all cores running at a lower frequency, or off-lining that core and running at max frequency, but because that core has only been turned offline, and not actually running at 0 Hz, the system can still crash but much rarely.
When it crashes, I think the kernel internally throws a panic, and restarts the system (or maybe it could be some watchdog timer doing it, idk). But if I run with the parameters - "panic=0 mce=off nomce nomca" (I don't think nocme and nomca do anything, but I still put it just in case, and I also always like to put nmi_watchdog=0; I also forgot all this time to put log levels to max, but I think everything critical is still shown and all info about it as well), when the system crashes, If I am running in the TTY (I've not managed to see it alive when running GUI), the system would actually be alive and semi-operable. It will be throwing errors of unexpected int18 by mce, soft lock up messages by watchdog, and rcu_preempt detecting stalls on CPUs.
But that one time I tried, I was actually shocked how alive the system was. Now I don't exactly remember generally what the state of the system was being when I used to experiment a while ago (I think bash and some programs such as intel_gpu_top would work), but in my most recent attempt, bash was working, intel_gpu_top was working, btop worked if I already had loaded it up before the lockup, top and htop were working, and this was when I tried it, even Sway launched! I could even move my mouse. One CPU core was literally stuck, but the system was still responding to USB interrupts! Even swaybar was displaying the time. I then tried launching Chromium, but that didn't work. And I think after a while the system froze (it also halts in between for a few seconds, before displaying the error messages). I tried to get Chromium running, but the more complexity I have running or try to run, the more likely it is to freeze. I think it is just luck of what all is scheduled on the cores, and stuff that are scheduled on that core which is stuck, can't run, but this is just a guess. I would like to know why it freezes and completely halts (or who knows, but I just didn't wait enough for it to perhaps recover).
This would explain why sometimes in Windows, when an hardware error occurs, a BSOD is still able to render. The system hasn't completely halted. It's just a safety shutdown of the system when an error occurs, which means that there has to be a way to bypass it, right?
Is it actually possible to get that CPU core running, while the system has soft locked? It's not like it has stopped. I am pretty sure that it is still running (turning off individual cores isn't supported on my system, or at least by the software).
I also tried off-lining that core when soft locked, but bash just stops responding.
When it crashes, I think the kernel internally throws a panic, and restarts the system (or maybe it could be some watchdog timer doing it, idk). But if I run with the parameters - "panic=0 mce=off nomce nomca" (I don't think nocme and nomca do anything, but I still put it just in case, and I also always like to put nmi_watchdog=0; I also forgot all this time to put log levels to max, but I think everything critical is still shown and all info about it as well), when the system crashes, If I am running in the TTY (I've not managed to see it alive when running GUI), the system would actually be alive and semi-operable. It will be throwing errors of unexpected int18 by mce, soft lock up messages by watchdog, and rcu_preempt detecting stalls on CPUs.
But that one time I tried, I was actually shocked how alive the system was. Now I don't exactly remember generally what the state of the system was being when I used to experiment a while ago (I think bash and some programs such as intel_gpu_top would work), but in my most recent attempt, bash was working, intel_gpu_top was working, btop worked if I already had loaded it up before the lockup, top and htop were working, and this was when I tried it, even Sway launched! I could even move my mouse. One CPU core was literally stuck, but the system was still responding to USB interrupts! Even swaybar was displaying the time. I then tried launching Chromium, but that didn't work. And I think after a while the system froze (it also halts in between for a few seconds, before displaying the error messages). I tried to get Chromium running, but the more complexity I have running or try to run, the more likely it is to freeze. I think it is just luck of what all is scheduled on the cores, and stuff that are scheduled on that core which is stuck, can't run, but this is just a guess. I would like to know why it freezes and completely halts (or who knows, but I just didn't wait enough for it to perhaps recover).
This would explain why sometimes in Windows, when an hardware error occurs, a BSOD is still able to render. The system hasn't completely halted. It's just a safety shutdown of the system when an error occurs, which means that there has to be a way to bypass it, right?
Is it actually possible to get that CPU core running, while the system has soft locked? It's not like it has stopped. I am pretty sure that it is still running (turning off individual cores isn't supported on my system, or at least by the software).
I also tried off-lining that core when soft locked, but bash just stops responding.