Solved Debian 12 bookworm hard drives error buffer I/O Error on device sda1 EXT4-fs error (device sda1)

Solved issue

WHatever

New Member
Joined
Dec 26, 2023
Messages
9
Reaction score
3
Credits
74
I am trying to use Debian Bookworm, version 12.4.0 on a Dell PowerEdge 1950 server with 2 seagate hdd (SAS).

Everything worked perfectly until I got an error message from my laptop (my laptop was connected to the server using ssh) :

-bash: ls: command not found

I then got the following error messages on the server monitor (see image for all the errors):

Buffer I/O Error on device sda1, logical block 27322
EXT4-fs error (device sda1)
etc....
20231226_204113.jpg

see image of error messages

PS: SYSTEM EVENT LOG according to the BIOS :

PCIE Fatal Err: Critical Event sensor, (BUS 0 DEVICE 3 FUNCTION 0) was asserted.

I first tried to power off the server and remove cables for more than 5minutes, the error then disappeared.

From then on I tried rebooting the system and reinitializing the logical drives with the RAID controller in my BIOS.
At first I could import a "foreign" configuration (probably debian's raid config) but each time I had to rebuild one of the physical drives. Then I only had one physical drive detected.

At last I managed to reboot correctly and to connect to internet with the command :
"sudo dhclient enp5s0"
unfortunately, while using ssh again, the error reoccured with the same messages and now I see no hard drives in my BIOS.

I tried reseting the RAID and logical drive configuration multiple times but with no luck, nothing happens and the "foreign" configuration appears and disappears from times to times but is not stable.

Actually I think the root of the problem lies with the fact that my disks or their connection is not stable. The other explanation could be that the disks are too old and just won't work. I would prefer to find a solution where I do not have to buy new disks.

PS: NO SYSTEM EVENT LOG according to the BIOS after the previous one.
 
Last edited:


I'm not running a server so not much help there, however:

Our 2 members @dos2unix and @f33dm3bits I think are well versed in running Linux on a server. Wait and see what they say:-
 
Last edited:
@WHatever welcome to linux.org

You may not be aware of it, but it is incredibly bad manners to have two identical threads posted at two or more websites at the same time.

Stack Exchange's helpers cannot be expected to know what is going on over at our place, nor ours their place.

Please consider closing your thread there if you are getting helpful input here. Or vice versa.

Once we hear from you I can either leave this thread open, or close it.

TIA

Chris Turner
wizardfromoz
 
@WHatever welcome to linux.org

You may not be aware of it, but it is incredibly bad manners to have two identical threads posted at two or more websites at the same time.

Stack Exchange's helpers cannot be expected to know what is going on over at our place, nor ours their place.

Please consider closing your thread there if you are getting helpful input here. Or vice versa.

Once we hear from you I can either leave this thread open, or close it.

TIA

Chris Turner
wizardfromoz
alright I deleted the post on stack exchanges, don't close this one here

Thank you
 
If you determine that the hard disks are in good health and worth working with, but still get the messages about the error finding the superblock, you could consider replacing it with one of the alternative superblocks since hard disks have a number of locations for storage of superblock information which can be used to restore a superblock. There's an example of the process here:
It needs root to operate and a live disk.

Bear in mind that the hardware needs to be sound for this to be an effective approach, and the loss of "contact" with a superblock is not a good sign in the first place. Using fsck though can help, and getting bad blocks out of the way can also help.
 
It probably has to do with that a few recent kernels caused ext4 filesystem corruption, if not i/o errors are usually a sign of a disk that has failed. However since you are saying you have a raid setup I would find it highly unlikely that both of them failed at the same time. Was the kernel recently updated and if so what kernel version is the system running?
 
Last edited:
Hi there,
I run two Linux servers in production. First suspect is hardware failure as well. Second one would be the data corruption kernel but in 6.1.64 that was mentioned above.

Its not impossible that two drives of the same age and with the same history die shortly one after the other, especially with extensive rebooting and rebuilding of raid arrays.

I‘d really like to see the output of
Code:
smartctl -a /dev/sda

Btw. its new to me that removing cables while rebooting does anything. The power cable in rare cases did help but others I havent heard of yet.

Good luck
 
It probably has to do with that a few recent kernels caused ext4 filesystem corruption, if not i/o errors are usually a sign of a disk that has failed. However since you are saying you have a raid setup I would find it highly unlikely that both of them failed at the same time. Was the kernel recently updated and if so what kernel version is the system running?
I am using Debian 12.4's kernel, (I think Linux kernel 6.1.66-1) see the link to the ISO in question :
cd ISO
I think the issue you are refering to was fixed in debian 12.4 :
EXT 4 fixed
12.4 release
 
Hi there,
I run two Linux servers in production. First suspect is hardware failure as well. Second one would be the data corruption kernel but in 6.1.64 that was mentioned above.

Its not impossible that two drives of the same age and with the same history die shortly one after the other, especially with extensive rebooting and rebuilding of raid arrays.

I‘d really like to see the output of
Code:
smartctl -a /dev/sda

Btw. its new to me that removing cables while rebooting does anything. The power cable in rare cases did help but others I havent heard of yet.

Good luck
Bash:
###Input1 :
sudo smartctl --all /dev/sda
###Output1 :
/dev/sda: Unknown USB bridge [...[...]]
Please specify device type with the -d option

###Input2:
sudo smartctl -t offline /dev/sda
###Output2:
/dev/sda: Unknown USB bridge [...[...]]
Please specify device type with the -d option

Indeed, the two drives have the same history and were both using RAID 1. seams a bit weird. I will try testing them on another machine and come back to you.
 
Bash:
###Input1 :
sudo smartctl --all /dev/sda
###Output1 :
/dev/sda: Unknown USB bridge [...[...]]
Please specify device type with the -d option

###Input2:
sudo smartctl -t offline /dev/sda
###Output2:
/dev/sda: Unknown USB bridge [...[...]]
Please specify device type with the -d option

Indeed, the two drives have the same history and were both using RAID 1. seams a bit weird. I will try testing them on another machine and come back to you.
Can you do
fdisk -l
and show me the output? I'm not sure whats happening with that machine. Def seem like a hardware issue (it could be power but since they both dont show up I'd go for hardware).
 
Can you do
fdisk -l
and show me the output? I'm not sure whats happening with that machine. Def seem like a hardware issue (it could be power but since they both dont show up I'd go for hardware).
sure, I could not find a cheap way to test my SAS drives on my Laptop btw so that seams like a dead-end. Also keep in mind that the server is more than 10 Y/O and the drives are probably as old.

Bash:
#IN1:
sudo fdisk -l
#Out1:
Disk /dev/sda: 29.3 GiB 31457280000 bytes, .... (see picture bcse there is a lot of text)

#in2:
ls /dev/sd*
#Out2:
/dev/sda /dev/sda1 /dev/sda2

20231227_150539 (1).jpg
 
Can you do
fdisk -l
and show me the output? I'm not sure whats happening with that machine. Def seem like a hardware issue (it could be power but since they both dont show up I'd go for hardware).
Little update :
Both Physical disks reappeared in the integrated bios configuration utility, the "foreign" configuration reappeared as well but only with one of the disks this time, I try to reset the config but after refreshing the foreign config comes back as if I never reseted, also the disks may be visible here but not from the debian console
 
I cleared the foreign config and tried "Create a new VD" this time, the VD is initializing and the foreing config does not seem to come back as I refresh.
 
Little update :
Both Physical disks reappeared in the integrated bios configuration utility, the "foreign" configuration reappeared as well but only with one of the disks this time, I try to reset the config but after refreshing the foreign config comes back as if I never reseted, also the disks may be visible here but not from the debian console
I‘m somewhat sure that the disks are not okay if they glitch in and out of existence even on the bios level.

Do they make sounds by any chance? A ruined disk might make clicking or other sounds.

If and when they appear at boot, try running smartctl -a on them to see if they know they‘re wrecked.

Otherwise connect them to another device if you have one that takes sas and see if they‘re being detected there. Non detectable drives sound ruined to me.
 
I‘m somewhat sure that the disks are not okay if they glitch in and out of existence even on the bios level.

Do they make sounds by any chance? A ruined disk might make clicking or other sounds.

If and when they appear at boot, try running smartctl -a on them to see if they know they‘re wrecked.

Otherwise connect them to another device if you have one that takes sas and see if they‘re being detected there. Non detectable drives sound ruined to me.
Indeed seems like it, one of the disks went offline during initialization. The one that went offline does make clicking sounds btw. Allright, thank you for your help !!
 
It probably has to do with that a few recent kernels caused ext4 filesystem corruption, if not i/o errors are usually a sign of a disk that has failed. However since you are saying you have a raid setup I would find it highly unlikely that both of them failed at the same time. Was the kernel recently updated and if so what kernel version is the system running?
Looking at my Debian 12 installation:

Code:
ii  linux-image-6.1.0-15-amd64               6.1.66-1                             amd64        Linux 6.1 for 64-bit PCs (signed)
ii  linux-image-6.1.0-16-amd64               6.1.67-1                             amd64        Linux 6.1 for 64-bit PCs (signed)
ii  linux-image-amd64                        6.1.67-1                             amd64        Linux for 64-bit PCs (meta-package)
@debian-box:~$ uname -a
Linux debian-box 6.1.0-16-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.67-1 (2023-12-12) x86_64 GNU/Linux

Thanks for posting the link on linuxiac.com, very helpful.
 

Members online


Top