Linux crashing (Im a complete noob help)

TITAN

New Member
Joined
Mar 7, 2023
Messages
13
Reaction score
0
Credits
109
I believe the issue to be memory related, I do not know of a way to confirm memtest comes up clean.
I believe my crashes to be memory related due to the fact by changing a few memory settings, I got a change in behavior of the crash when it occurs.
Previously the machine would completely lock up requiring the plug to be pulled, being non responsive to the power button and anything else if you don't pull the plug it would stay frozen indefnitely.
Now with the changed setting the machine simply becomes unresponsive to mouse and keyboard but otherwise still running (it starts repeating the last key stroke too if you happen to have a text box selected when it starts to crumble) then it freezes. audio stops. then a minute or two later display turns off ie it looses signal. Then about 5 minutes later the machine itself shuts down peacefully.

I know hardware. not linux I dont know where to start on linux. Please help.
 


Welcome
nice story but lacking substance, what is the machine, which distribution are you having problems with, did you get any crash reports, what were you doing when it froze? Please give as much information as you can, we are not mind readers.
 
For when it crashes, It seems to have no ryme or reason. No particular task makes it happen sometimes ill be using it and it will pop off, sometimes it will go for a few days and nothing, or sometimes i might be in bed or something and just hear it power down after not being used for hours at that point.
For distro, Debian 10
Sub-distro Peppermint 10.

Hardware
E5-2690 v2
Mobo: https://www.asus.com/supportonly/rampage_iv_gene/helpdesk_knowledge/
Radeon VEGA 64 (The same issue occurs with a RX550 so its not because funny goofy vega)
Memory 4x8gb Kingston

This is all i know to say as of now, Im extremely like babies first computer level beginner as far as linux goes.
 

Attachments

  • IMG_20230305_233639.jpg
    IMG_20230305_233639.jpg
    636.5 KB · Views: 163
I know hardware. not linux I dont know where to start on linux.
I am a hardware man myself, so I will assume you have done the usual maintenance [swap the ram around, pull all connectors and replace etc to clear any possible "dry joints" that may have developed,

As it will run for several hours until it drops out my thinking [without seeing it] is either a heat problem [PSU on the blink or thermal joint between the CPU and heat sink may be breaking down, ] or worse still would be a component on the MB breaking down.

the E5-2690 v2 was designed for servers
 
Well thats the weird part is im talking it might be idling, or full load. if i do some kind of stress test nothing, the PSU does not crumble. Basically i have not found a way to forcefully trigger the crash.

And for thermals CPU temps alone peak at 51c (I use 7500rpm 92mm fans because i have alot of hard drives) but once again changing memory settings completely changed crash behavior leads back to memory something.
But more importantly a loose connection would show up on Memtest, Memtest is really good for hard faults like that.

I have thought about checking logs but, Im confused about how to when I read any documentation since once again Im just figuring out linux
 
I am a hardware man myself, so I will assume you have done the usual maintenance [swap the ram around, pull all connectors and replace etc to clear any possible "dry joints" that may have developed,

As it will run for several hours until it drops out my thinking [without seeing it] is either a heat problem [PSU on the blink or thermal joint between the CPU and heat sink may be breaking down, ] or worse still would be a component on the MB breaking down.

the E5-2690 v2 was designed for servers
You sure about that? It seems more like to me that the hard drive may be failing, it sh***s itself as soon as it tries to access corrupt data on the drive.

I find it strange that there is no kernel panic.

BTW, I've been gaming on a Xeon for 8 years now, lets not go there!
 
Its a RAID 6 7x 3.5 7200rpm drives with a 1x 15.7k rpm 3.5 without any form of RAID
All integrity checks come back clean. Data corruption is not the issue. the boot drive is the 15.7k
 
Last edited:
Also what does Kernel panic mean? once again if this is something I would have to go check on I have not checked on it. I know nothing about the software or linux or its workings besides its FOSS
 
Oh, its like the blue screen of death for Linux.

There is one easy way to find out if your drives are really working as intended. Pull the latest SMART data from them. Its really easy. Just look it up on DuckDuckGo.
 
Oh you can use IT terms with me, I understand that its just i have been a windows/windows server power user so hardware and similar stuff is fine to use term wise for me. I just dont know Linux things. thats the only grey unknown area,

But i will check SMART later today, but I do not believe its going to be drives as im using a DELL H710 hardware raid controller, it can and will kick a drive automatically if SMART kicks it into a preemptive failure state and starts a Rebuild.

I will report with SMART later today.
 
You sure about that? It seems more like to me that the hard drive may be failing,
yes another possibility, but I made the assumption the OP had checked that,
BTW, I've been gaming on a Xeon for 8 years now, lets not go there!
nothing wrong with that, it was designed for Servers, so I would expect it to perform well [for its age] and be more reliable!
 
@TITAN
there is a possibility, it may be the driver for the H710, but I would not expect it to run for several hours before crashing. found this old article for your perusal
 
The H710, the drivers work. I can interface with it via CLI but even if its gone, and this controller was used in another machine a ryzen for months, with the same linux distro never had a issue. Never installed any wierd drivers. But I can confirm i can interface with it in CLI would have to go install the whole toolkit thing to do so tho but i can confirm it works.
 
In the mean time can someone instruct me on how to actually look at crash logs and such, going by what i know about windows. Im assuming that might reveal alot about whats happening.
 
Ok i checked the smart data, all drives all 8 of them are reported as healthy/NO ERROR

Still don't know how to check the logs
 
Do i just open them like a text document or what?
Yes, you can open them just like a text file. Depending on your system and which log file you want to see, you may need admin or superuser (sudo) privileges.

Hint: You can also watch log files update in real time using the "tail -f <filename>" or "tail -F <filename>" command. It shows the end of the file and then keeps displaying the new entries as they are added. (CTRL-C stops it.)

Example:
Try "tail -f /var/log/syslog" from an admin account or "sudo tail -f /var/log/syslog", then wait for new entries to appear.

(The -F version detects when the system closes the old log file (because it is "full") and starts a fresh version of the log file. It will keep trying to open the new file and continue the tail. I always used "-f". It is a habit.)
 
There are logs using the "journalctl" command which are not accessible from a text editor. To see the logs for the current boot run in a terminal:
Code:
journalctl -b
and page through the output with the arrow keys or space bar.
To see errors:
Code:
journalctl -b -x -p 3
There are many variations of options of which the manpage has some examples.

TITAN asked:
What log file should I look into for crashes?
Crashes aren't always recorded in log files because they don't tend to signal themselves, but hints about what happened may be obtained by log contents close to the time of the crash.
 

Members online


Top