File system keeps corrupting (need help)

My SSD shows in the Boot Menu...if it were me I'd replace the SSD.

View attachment 30973
I think this is probably a good idea. I will try running the tests that have been recommended to hopefully figure out what's going on, but unless we find something that can be fixed it seems like replacing the SSD would provide the best stability rather than trying to force this one to work.
 


Overall test passed but a ton of info is omitted from the report, smartctl command you need is:
Bash:
sudo smartctl -Ax /dev/nvme0n1 > ~/smart.txt
It will create smart.txt file in your home dir, attach it here, it's more useful than screenshot.

Here is an article from my bookmarks about how to interpret smart report:
Thank you, I have attached the file below and I will use the article you provided to hopefully figure out what any of this means.
 

Attachments

I will make 2 suggestions.
1... already mentioned in here was to do a SMART test on the drive and see if you have any errors at all.
2... Do not use encryption, many times that is the cause of more problems than you can imagine. I see it every week from clients.
Thank you for the suggestions
 
Thank you! I ran the first file check you recommended in a live session, and it came back clean. Should I still try the second fsck command, or since it came back clean does this check mean the file system is alright? Afterwards I updated outdated BIOS to the newest version, and verified that the update worked. However, when I attempted to boot into Linux Mint, I ended up with a purple screen that said "KERNEL PANIC!" and included the following error: "VFS: Unable to mount root fs on unknown-block(0,0)". Also, I'm guessing most of those unsafe shutdowns happened because with the desktop breaking I kept getting stuck unable to Ctrl + Alt +Del and needing to power off with the button.
You are correct in suggesting that the file check repairing command is not needed if the fsck command doesn't return an error.

On the kernel panic, the message shows that the kernel can't find the root filesystem. In that case the first thing to try is to tell the kernel where the root filesystem is, for example, add the kernel option:
root=/dev/<root>
where "root" is the root device shown in the lsblk output on your machine. In post #10 above, one can identify the root partition by the symbol "/" in the MOUNTPOINTS column, so for that machine the device name would be: /dev/nvme0n1p3. You have to get it correct from your machine.

If unsure how to add a kernel option, perhaps check here: https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter

If that works, then the machine should boot and then you can mount any other partitions you need to run. If it doesn't work, there's some other issue which will need to be attended to.
 
Last edited:
You are correct in suggesting that the file check repairing command is not needed if the fsck command doesn't return an error.

On the kernel panic, the message shows that the kernel can't find the root filesystem. In that case the first thing to try is to tell the kernel where the root filesystem is, for example, add the kernel option:
root=/dev/<root>
where "root" is the root device shown in the lsblk output on your machine. In post #10 above, one can identify the root partition by the symbol "/" in the MOUNTPOINTS column, so for that machine the device name would be: /dev/nvme0n1p3. You have to get it correct from your machine.

If unsure how to add a kernel option, perhaps check here: https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter

If that works, then the machine should boot and then you can mount any other partitions you need to run. If it doesn't work, there's some other issue which will need to be attended to.
I added the kernel option pointing to my root drive and attempted to boot with the setting, and ended up in BusyBox with file system errors. I ran a manual fsck, which repaired corruption in what seems to be my Firefox cache, after which I was able to exit and end up on my desktop. Do you think this is something I will be able to mostly solve with my current hardware, or would I have a more stable system by getting a different SSD? I appreciate the help.

Edit: I edited the GRUB to make the kernel option permanent and used sudo update-grub to apply the changes
 
Last edited:
Thanks for the help everyone. I have wiped the SSD I was using for my Windows system, and I am installing Linux there, as I believe this will result in a more stable experience.
 
Thanks for the help everyone. I have wiped the SSD I was using for my Windows system, and I am installing Linux there, as I believe this will result in a more stable experience.
until you verify that the ssd is physically ok you can run into the same issues over and over. that is why I said use S.M.A.R.T. on the drive and verify that it is in good working order. otherwise you will just go in circles.
 
The other day I set up a fresh installation of Linux Mint, and after a couple hours my desktop environment broke. After restarting, I ended up in busybox with EXT4 file system corruption and I/O errors. I’ve tried everything I could think of (which isn’t a lot, I’m new to Linux) but even creating a fresh install leads to the same corruption and I/O errors. My SSD is the WD Black SN850x, and a test confirmed that there are 0 bad blocks. I’m not sure what to do next, and any help is appreciated.
Aside from the other answers, I'm gonna trhow in my 2 cents - just FYI, so you know this for the future: there are cases where bad blocks won't appear in diagnostic tools + the S.M.A.R.T. function doesn't always work (or properly) with SSDs. It was originally intended for hard discs (you know - spinnin' discs, needles, weighs a ton). Sometimes this SMART function glitches out when used on SSDs: it reads one of my SSDs has 48 bad sectors and yet that SSD behaves perfectly well + a few Spyware tools (Spyware 10) said it was in perfect health.

So far the best test for SSDs I have personally found (in Linux) is to copy a large file (50-60 GB or more) from another storage to the device you want tested using rsync. If the write speed you see is twice lower than usual, that's a clear sign the target device is declining and/or is deffective in so many ways that won't appear in diagnostic tools. But if the writing speed is the usual, then that means the I/O is OK.
I do that test with 2 bash scripts and it takes seconds but IDK if you're this advanced to use such scripts.
 
Aside from the other answers, I'm gonna trhow in my 2 cents - just FYI, so you know this for the future: there are cases where bad blocks won't appear in diagnostic tools + the S.M.A.R.T. function doesn't always work (or properly) with SSDs. It was originally intended for hard discs (you know - spinnin' discs, needles, weighs a ton). Sometimes this SMART function glitches out when used on SSDs: it reads one of my SSDs has 48 bad sectors and yet that SSD behaves perfectly well + a few Spyware tools (Spyware 10) said it was in perfect health.

So far the best test for SSDs I have personally found (in Linux) is to copy a large file (50-60 GB or more) from another storage to the device you want tested using rsync. If the write speed you see is twice lower than usual, that's a clear sign the target device is declining and/or is deffective in so many ways that won't appear in diagnostic tools. But if the writing speed is the usual, then that means the I/O is OK.
I do that test with 2 bash scripts and it takes seconds but IDK if you're this advanced to use such scripts.
you can have bad sectors and everything works fine. Does not mean the SMART was incorrect, it means the drive is compensating for the issue. It does mean the drive is going down. Yes SMART has issues at times with ssd but it is reading information from the chipset of the drive. Much like when the mechanic plugs his computer into your car, he can see much of what is going on and has gone on. ignoring errors on SMART is like seeing the check engine light on your car but saying it is fine because it still runs. I would love an update to SMART that does better with ssd and nvme but don't ignore what it says just because you personally do not see an issue. The drives are designed to compensate for issues rather than just outright fail.
 
Last edited:
I have attached the file below
I had a look at your smartctl output and the second line looks bad to me:

Media and Data Integrity Errors: 1
Error Information Log Entries: 10,013

Doing the same on my machine here (1600 power-on-hours, i.e. more than yours), shows no errors. Just to be sure, you can repeat the command by explicitly specifying NVME explicitly smartctl -AX -d nvme /dev/nvme123 but I don't think it will show something else. More important, have a look at the error log itself:

nvme error-log /dev/nvme123

It will be a long list of the same error. When the drive can recover an error, it is stated (my log also shows a long list of recovered problems). Perhaps the log gives an idea. smartctl can read the error log as well, but I don't know the command off-head.

For NVME typical problems can be temperature related. I did not catch if that was already mentioned. The drives are smart, but sustained high temperature are a problem for them. Switching its slot, heatsink, making sure some air passes over it (on a desktop) may help.
 


Follow Linux.org

Members online

No members online now.

Top