Solved Debian 12 bookworm hard drives error buffer I/O Error on device sda1 EXT4-fs error (device sda1)

WHatever · Dec 26, 2023

I am trying to use Debian Bookworm, version 12.4.0 on a Dell PowerEdge 1950 server with 2 seagate hdd (SAS).

Everything worked perfectly until I got an error message from my laptop (my laptop was connected to the server using ssh) :

-bash: ls: command not found

I then got the following error messages on the server monitor (see image for all the errors):

Buffer I/O Error on device sda1, logical block 27322
EXT4-fs error (device sda1)
etc....

see image of error messages

PS: SYSTEM EVENT LOG according to the BIOS :

PCIE Fatal Err: Critical Event sensor, (BUS 0 DEVICE 3 FUNCTION 0) was asserted.

I first tried to power off the server and remove cables for more than 5minutes, the error then disappeared.

From then on I tried rebooting the system and reinitializing the logical drives with the RAID controller in my BIOS.
At first I could import a "foreign" configuration (probably debian's raid config) but each time I had to rebuild one of the physical drives. Then I only had one physical drive detected.

At last I managed to reboot correctly and to connect to internet with the command :
"sudo dhclient enp5s0"
unfortunately, while using ssh again, the error reoccured with the same messages and now I see no hard drives in my BIOS.

I tried reseting the RAID and logical drive configuration multiple times but with no luck, nothing happens and the "foreign" configuration appears and disappears from times to times but is not stable.

Actually I think the root of the problem lies with the fact that my disks or their connection is not stable. The other explanation could be that the disks are too old and just won't work. I would prefer to find a solution where I do not have to buy new disks.

PS: NO SYSTEM EVENT LOG according to the BIOS after the previous one.

Alexzee · Dec 26, 2023

I'm not running a server so not much help there, however:

Our 2 members @dos2unix and @f33dm3bits I think are well versed in running Linux on a server. Wait and see what they say:-

osprey · Dec 26, 2023

Consider running fsck on all the disks, and checking their health with smartctl from the smartmontools package ... from a live disk perhaps. The intermittent nature of the issues suggests hardware issues.

Also asked here: https://unix.stackexchange.com/ques...r-buffer-i-o-error-on-device-sda1-ext4-fs-err

wizardfromoz · Dec 27, 2023

@WHatever welcome to linux.org

You may not be aware of it, but it is incredibly bad manners to have two identical threads posted at two or more websites at the same time.

Stack Exchange's helpers cannot be expected to know what is going on over at our place, nor ours their place.

Please consider closing your thread there if you are getting helpful input here. Or vice versa.

Once we hear from you I can either leave this thread open, or close it.

TIA

Chris Turner
wizardfromoz

WHatever · Dec 27, 2023

wizardfromoz said:
@WHatever welcome to linux.org

You may not be aware of it, but it is incredibly bad manners to have two identical threads posted at two or more websites at the same time.

Stack Exchange's helpers cannot be expected to know what is going on over at our place, nor ours their place.

Please consider closing your thread there if you are getting helpful input here. Or vice versa.

Once we hear from you I can either leave this thread open, or close it.

TIA

Chris Turner
wizardfromoz

alright I deleted the post on stack exchanges, don't close this one here

Thank you

osprey · Dec 27, 2023

If you determine that the hard disks are in good health and worth working with, but still get the messages about the error finding the superblock, you could consider replacing it with one of the alternative superblocks since hard disks have a number of locations for storage of superblock information which can be used to restore a superblock. There's an example of the process here:

How do I fix a superblock in Linux?

Answer (1 of 2): If your system will give you a terminal type the following command, else boot Linux system from rescue disk (boot from 1st CD/DVD. At boot: prompt type command linux rescue). Mount partition using alternate superblock Find out superblock location for /dev/sda2: # dumpe2fs /dev...

www.quora.com

It needs root to operate and a live disk.

Bear in mind that the hardware needs to be sound for this to be an effective approach, and the loss of "contact" with a superblock is not a good sign in the first place. Using fsck though can help, and getting bad blocks out of the way can also help.

f33dm3bits · Dec 27, 2023

It probably has to do with that a few recent kernels caused ext4 filesystem corruption, if not i/o errors are usually a sign of a disk that has failed. However since you are saying you have a raid setup I would find it highly unlikely that both of them failed at the same time. Was the kernel recently updated and if so what kernel version is the system running?

Ext4 data corruption in stable kernels [LWN.net]

lwn.net

#1057843 - linux: ext4 data corruption in 6.1.64-1 - Debian Bug report logs

bugs.debian.org

Hold Off Debian Upgrades: Kernel 6.1.64 ext4 Bug Alert

A data corruption bug in Linux kernel 6.1.64-1 with possible data loss on ext4 file systems, delaying the release of Debian 12.3.

linuxiac.com

Haui111 · Dec 27, 2023

Hi there,
I run two Linux servers in production. First suspect is hardware failure as well. Second one would be the data corruption kernel but in 6.1.64 that was mentioned above.

Its not impossible that two drives of the same age and with the same history die shortly one after the other, especially with extensive rebooting and rebuilding of raid arrays.

I‘d really like to see the output of

Code:

smartctl -a /dev/sda

Btw. its new to me that removing cables while rebooting does anything. The power cable in rare cases did help but others I havent heard of yet.

Good luck

WHatever · Dec 27, 2023

osprey said:
Consider running fsck on all the disks, and checking their health with smartctl from the smartmontools package ... from a live disk perhaps. The intermittent nature of the issues suggests hardware issues.

Also asked here: https://unix.stackexchange.com/ques...r-buffer-i-o-error-on-device-sda1-ext4-fs-err

Running this and "fdisk -l" gives me only the USB key i use for live booting but no disks visible

WHatever · Dec 27, 2023

f33dm3bits said:
It probably has to do with that a few recent kernels caused ext4 filesystem corruption, if not i/o errors are usually a sign of a disk that has failed. However since you are saying you have a raid setup I would find it highly unlikely that both of them failed at the same time. Was the kernel recently updated and if so what kernel version is the system running?

Ext4 data corruption in stable kernels [LWN.net]

lwn.net

#1057843 - linux: ext4 data corruption in 6.1.64-1 - Debian Bug report logs

bugs.debian.org

Hold Off Debian Upgrades: Kernel 6.1.64 ext4 Bug Alert

A data corruption bug in Linux kernel 6.1.64-1 with possible data loss on ext4 file systems, delaying the release of Debian 12.3.

linuxiac.com

I am using Debian 12.4's kernel, (I think Linux kernel 6.1.66-1) see the link to the ISO in question :
cd ISO
I think the issue you are refering to was fixed in debian 12.4 :
EXT 4 fixed
12.4 release

WHatever · Dec 27, 2023

Haui111 said:
Hi there,
I run two Linux servers in production. First suspect is hardware failure as well. Second one would be the data corruption kernel but in 6.1.64 that was mentioned above.

Its not impossible that two drives of the same age and with the same history die shortly one after the other, especially with extensive rebooting and rebuilding of raid arrays.

I‘d really like to see the output of

Code:

smartctl -a /dev/sda

Btw. its new to me that removing cables while rebooting does anything. The power cable in rare cases did help but others I havent heard of yet.

Good luck

Bash:

###Input1 :
sudo smartctl --all /dev/sda
###Output1 :
/dev/sda: Unknown USB bridge [...[...]]
Please specify device type with the -d option

###Input2:
sudo smartctl -t offline /dev/sda
###Output2:
/dev/sda: Unknown USB bridge [...[...]]
Please specify device type with the -d option

Indeed, the two drives have the same history and were both using RAID 1. seams a bit weird. I will try testing them on another machine and come back to you.

Haui111 · Dec 27, 2023

WHatever said:
Bash:

###Input1 : sudo smartctl --all /dev/sda ###Output1 : /dev/sda: Unknown USB bridge [...[...]] Please specify device type with the -d option ###Input2: sudo smartctl -t offline /dev/sda ###Output2: /dev/sda: Unknown USB bridge [...[...]] Please specify device type with the -d option

Indeed, the two drives have the same history and were both using RAID 1. seams a bit weird. I will try testing them on another machine and come back to you.

Can you do
fdisk -l
and show me the output? I'm not sure whats happening with that machine. Def seem like a hardware issue (it could be power but since they both dont show up I'd go for hardware).

WHatever · Dec 27, 2023

Haui111 said:
Can you do
fdisk -l
and show me the output? I'm not sure whats happening with that machine. Def seem like a hardware issue (it could be power but since they both dont show up I'd go for hardware).

sure, I could not find a cheap way to test my SAS drives on my Laptop btw so that seams like a dead-end. Also keep in mind that the server is more than 10 Y/O and the drives are probably as old.

Bash:

#IN1:
sudo fdisk -l
#Out1:
Disk /dev/sda: 29.3 GiB 31457280000 bytes, .... (see picture bcse there is a lot of text)

#in2:
ls /dev/sd*
#Out2:
/dev/sda /dev/sda1 /dev/sda2

WHatever · Dec 27, 2023

Haui111 said:
Can you do
fdisk -l
and show me the output? I'm not sure whats happening with that machine. Def seem like a hardware issue (it could be power but since they both dont show up I'd go for hardware).

Little update :
Both Physical disks reappeared in the integrated bios configuration utility, the "foreign" configuration reappeared as well but only with one of the disks this time, I try to reset the config but after refreshing the foreign config comes back as if I never reseted, also the disks may be visible here but not from the debian console

WHatever · Dec 27, 2023

I cleared the foreign config and tried "Create a new VD" this time, the VD is initializing and the foreing config does not seem to come back as I refresh.

Haui111 · Dec 27, 2023

WHatever said:
Little update :
Both Physical disks reappeared in the integrated bios configuration utility, the "foreign" configuration reappeared as well but only with one of the disks this time, I try to reset the config but after refreshing the foreign config comes back as if I never reseted, also the disks may be visible here but not from the debian console

I‘m somewhat sure that the disks are not okay if they glitch in and out of existence even on the bios level.

Do they make sounds by any chance? A ruined disk might make clicking or other sounds.

If and when they appear at boot, try running smartctl -a on them to see if they know they‘re wrecked.

Otherwise connect them to another device if you have one that takes sas and see if they‘re being detected there. Non detectable drives sound ruined to me.

WHatever · Dec 27, 2023

Haui111 said:
I‘m somewhat sure that the disks are not okay if they glitch in and out of existence even on the bios level.

Do they make sounds by any chance? A ruined disk might make clicking or other sounds.

If and when they appear at boot, try running smartctl -a on them to see if they know they‘re wrecked.

Otherwise connect them to another device if you have one that takes sas and see if they‘re being detected there. Non detectable drives sound ruined to me.

Indeed seems like it, one of the disks went offline during initialization. The one that went offline does make clicking sounds btw. Allright, thank you for your help !!

Alexzee · Dec 27, 2023

f33dm3bits said:
It probably has to do with that a few recent kernels caused ext4 filesystem corruption, if not i/o errors are usually a sign of a disk that has failed. However since you are saying you have a raid setup I would find it highly unlikely that both of them failed at the same time. Was the kernel recently updated and if so what kernel version is the system running?

Ext4 data corruption in stable kernels [LWN.net]

lwn.net

#1057843 - linux: ext4 data corruption in 6.1.64-1 - Debian Bug report logs

bugs.debian.org

Hold Off Debian Upgrades: Kernel 6.1.64 ext4 Bug Alert

A data corruption bug in Linux kernel 6.1.64-1 with possible data loss on ext4 file systems, delaying the release of Debian 12.3.

linuxiac.com

Looking at my Debian 12 installation:

Code:

ii  linux-image-6.1.0-15-amd64               6.1.66-1                             amd64        Linux 6.1 for 64-bit PCs (signed)
ii  linux-image-6.1.0-16-amd64               6.1.67-1                             amd64        Linux 6.1 for 64-bit PCs (signed)
ii  linux-image-amd64                        6.1.67-1                             amd64        Linux for 64-bit PCs (meta-package)
@debian-box:~$ uname -a
Linux debian-box 6.1.0-16-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.67-1 (2023-12-12) x86_64 GNU/Linux

Thanks for posting the link on linuxiac.com, very helpful.

Solved Debian 12 bookworm hard drives error buffer I/O Error on device sda1 EXT4-fs error (device sda1)

New Member

Well-Known Member

Well-Known Member

Administrator

New Member

Well-Known Member

Gold Member

Member

New Member

New Member

New Member

Member

New Member

New Member

New Member

Member

New Member

Well-Known Member