I have a server with two mdraid arrays and a few standalone disks attached. I use two of the disks to do cold backups of one of the mdraids.
What I do is shut down the server, attach the two backup disks, boot the server, run my backup, unmount the two disks, shutdown and remove the two disk and put hem away, then start the server back up. This is a monthly operation.
Lately I am seeing a very strange behaviour with the two backup disks: after a time their device file assignments change while the server is running.
To be clear:
I am NOT talking about rebooting the server and finding the disks at different device files. I mean the server is running uninterrupted, the disks are mounted and at some point the device files for the disks change.
I am talking about the disks being on, say, /dev/sdk and /dev/sdl after starting the machine up, and then an hour or two later I find the two disks at /dev/sdq and /dev/sdr.
I noticed this because the backup script started failing with disk write errors, and after investigating I realized that my backup disks had moved to new device files thus the mounts were invalid!
Here is a log of what happens. Nutshell, I shut the machine down, attach the backup disks (pair of 24TB Barracudas) , boot it up and then do what you see in the following script. Note how the two 24TB disks change from being at /dev/sdk and /dev/sdl to being at /dev/sdq and /dev/sdr!
# Immediately after attaching the backup disks (the pair of 24TB disks) reboot:
root@5600x:~# parted -l 2>&1 | grep Disk | grep -v Flags | sort
Disk /dev/md125: 48.0TB
Disk /dev/md127: 2000GB
Disk /dev/nvme0n1: 1024GB
Disk /dev/nvme1n1: 4001GB
Disk /dev/sda: 8002GB
Disk /dev/sdb: 8002GB
Disk /dev/sdc: 8002GB
Disk /dev/sdd: 8002GB
Disk /dev/sde: 1024GB
Disk /dev/sdf: 500GB
Disk /dev/sdg: 1000GB
Disk /dev/sdh: 1000GB
Disk /dev/sdi: 1000GB
Disk /dev/sdj: 1000GB
Disk /dev/sdk: 24.0TB
Disk /dev/sdl: 24.0TB
Disk /dev/sdm: 8002GB
Disk /dev/sdn: 8002GB
Disk /dev/sdo: 8002GB
Disk /dev/sdp: 8002GB
root@5600x:~# mkdir /tmp/24T_1 /tmp/24T_2
root@5600x:~# mount /dev/sdk1 /tmp/24T_1
root@5600x:~# mount /dev/sdl1 /tmp/24T_2
root@5600x:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 4.8M 6.3G 1% /run
efivarfs 128K 58K 66K 47% /sys/firmware/efi/efivars
/dev/sdf2 457G 17G 417G 4% /
tmpfs 32G 1.1M 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdf1 1.1G 6.2M 1.1G 1% /boot/efi
/dev/sde1 938G 807G 84G 91% /home
/dev/md127 1.8T 1.4T 406G 77% /data2
/dev/nvme1n1p1 3.6T 1.9T 1.6T 55% /data3
/dev/md125 44T 32T 10T 76% /data
tmpfs 6.3G 16K 6.3G 1% /run/user/0
/dev/sdk1 22T 28K 21T 1% /tmp/24T_1
/dev/sdl1 22T 28K 21T 1% /tmp/24T_2
root@5600x:~# ls -lFa /tmp/24T_?
/tmp/24T_1:
total 32
drwxr-xr-x 3 root root 4096 Oct 27 00:54 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:51 ../
drwx------ 2 root root 16384 Oct 27 00:27 lost+found/
/tmp/24T_2:
total 32
drwxr-xr-x 3 root root 4096 Oct 27 00:54 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:51 ../
drwx------ 2 root root 16384 Oct 27 00:29 lost+found/
root@5600x:~# mkdir /tmp/24T_1/data /tmp/24T_2/data
root@5600x:~# ls -lFa /tmp/24T_?
/tmp/24T_1:
total 36
drwxr-xr-x 4 root root 4096 Oct 27 00:55 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:55 ../
drwxr-xr-x 2 root root 4096 Oct 27 00:55 data/
drwx------ 2 root root 16384 Oct 27 00:27 lost+found/
/tmp/24T_2:
total 36
drwxr-xr-x 4 root root 4096 Oct 27 00:55 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:55 ../
drwxr-xr-x 2 root root 4096 Oct 27 00:55 data/
drwx------ 2 root root 16384 Oct 27 00:29 lost+found/
root@5600x:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 4.8M 6.3G 1% /run
efivarfs 128K 58K 66K 47% /sys/firmware/efi/efivars
/dev/sdf2 457G 17G 417G 4% /
tmpfs 32G 1.1M 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdf1 1.1G 6.2M 1.1G 1% /boot/efi
/dev/sde1 938G 807G 84G 91% /home
/dev/md127 1.8T 1.4T 406G 77% /data2
/dev/nvme1n1p1 3.6T 1.9T 1.6T 55% /data3
/dev/md125 44T 32T 10T 76% /data
tmpfs 6.3G 16K 6.3G 1% /run/user/0
/dev/sdk1 22T 28K 21T 1% /tmp/24T_1
/dev/sdl1 22T 28K 21T 1% /tmp/24T_2
root@5600x:~# nohup cp -r /data/export /tmp/24T_1/data &
[1] 3718
root@5600x:~# nohup: ignoring input and appending output to 'nohup.out'
root@5600x:~# nohup cp -r /data/neural /tmp/24T_2/data &
[2] 3732
root@5600x:~# nohup: ignoring input and appending output to 'nohup.out'
root@5600x:~# date
Mon Oct 27 12:58:42 AM MDT 2025
< Go do something else for a while >
root@5600x:~# date
Mon Oct 27 02:30:38 AM MDT 2025
root@5600x:~# parted -l 2>&1 | grep Disk | grep -v Flags | sort
Disk /dev/md125: 48.0TB
Disk /dev/md127: 2000GB
Disk /dev/nvme0n1: 1024GB
Disk /dev/nvme1n1: 4001GB
Disk /dev/sda: 8002GB
Disk /dev/sdb: 8002GB
Disk /dev/sdc: 8002GB
Disk /dev/sdd: 8002GB
Disk /dev/sde: 1024GB
Disk /dev/sdf: 500GB
Disk /dev/sdg: 1000GB
Disk /dev/sdh: 1000GB
Disk /dev/sdi: 1000GB
Disk /dev/sdj: 1000GB
Disk /dev/sdm: 8002GB
Disk /dev/sdn: 8002GB
Disk /dev/sdo: 8002GB
Disk /dev/sdp: 8002GB
Disk /dev/sdq: 24.0TB
Disk /dev/sdr: 24.0TB
nohup.out is full of errors saying the targets of the copy commands are invalid, cannot be written.
To explain my command line "parted -l 2>&1 | grep Disk | grep -v Flags | sort" to list the disks: parted prints error messages for the 8TB disk mdraid members because I built it using raw dissks not partitions: mdraid does not know what to make of them and it prints errors about unknown partition type, which gets confusing, thus I redirected stderr to /dev/null. Grep Disk and grep out Flags merely to get a list of disk vs device file without the manufacturere or model info that parted prints.
To explain the disks:
The eight 8002GB disks are the members of /dev/md125, an 8 disk RAID6 mdraid.
These 8 disks are all attached to the motherboard SATA ports.
The NVMe are just storing somewhat hot, read active, data.
The four 1000GB disks are SATA SSDs, members of /dev/md127, a 4 disk RAID 10 mdraid.
Disk sde1 is mounted on /home.
Disk sdf1 is mounted on /.
The two wandering 24TB disks start out at /dev/sdk and /dev/sdl and later show up at /dev/sdq and /dev/sdr.
These 8 disks are attached to an LSI 9500-8i HBA.
I have never seen anything like this before, and I am at a loss.
How can a disk have it's device file assignment changed?
How can I prevent this happening!?
Any wisdom would be much appreciated.
What I do is shut down the server, attach the two backup disks, boot the server, run my backup, unmount the two disks, shutdown and remove the two disk and put hem away, then start the server back up. This is a monthly operation.
Lately I am seeing a very strange behaviour with the two backup disks: after a time their device file assignments change while the server is running.
To be clear:
I am NOT talking about rebooting the server and finding the disks at different device files. I mean the server is running uninterrupted, the disks are mounted and at some point the device files for the disks change.
I am talking about the disks being on, say, /dev/sdk and /dev/sdl after starting the machine up, and then an hour or two later I find the two disks at /dev/sdq and /dev/sdr.
I noticed this because the backup script started failing with disk write errors, and after investigating I realized that my backup disks had moved to new device files thus the mounts were invalid!
Here is a log of what happens. Nutshell, I shut the machine down, attach the backup disks (pair of 24TB Barracudas) , boot it up and then do what you see in the following script. Note how the two 24TB disks change from being at /dev/sdk and /dev/sdl to being at /dev/sdq and /dev/sdr!
# Immediately after attaching the backup disks (the pair of 24TB disks) reboot:
root@5600x:~# parted -l 2>&1 | grep Disk | grep -v Flags | sort
Disk /dev/md125: 48.0TB
Disk /dev/md127: 2000GB
Disk /dev/nvme0n1: 1024GB
Disk /dev/nvme1n1: 4001GB
Disk /dev/sda: 8002GB
Disk /dev/sdb: 8002GB
Disk /dev/sdc: 8002GB
Disk /dev/sdd: 8002GB
Disk /dev/sde: 1024GB
Disk /dev/sdf: 500GB
Disk /dev/sdg: 1000GB
Disk /dev/sdh: 1000GB
Disk /dev/sdi: 1000GB
Disk /dev/sdj: 1000GB
Disk /dev/sdk: 24.0TB
Disk /dev/sdl: 24.0TB
Disk /dev/sdm: 8002GB
Disk /dev/sdn: 8002GB
Disk /dev/sdo: 8002GB
Disk /dev/sdp: 8002GB
root@5600x:~# mkdir /tmp/24T_1 /tmp/24T_2
root@5600x:~# mount /dev/sdk1 /tmp/24T_1
root@5600x:~# mount /dev/sdl1 /tmp/24T_2
root@5600x:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 4.8M 6.3G 1% /run
efivarfs 128K 58K 66K 47% /sys/firmware/efi/efivars
/dev/sdf2 457G 17G 417G 4% /
tmpfs 32G 1.1M 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdf1 1.1G 6.2M 1.1G 1% /boot/efi
/dev/sde1 938G 807G 84G 91% /home
/dev/md127 1.8T 1.4T 406G 77% /data2
/dev/nvme1n1p1 3.6T 1.9T 1.6T 55% /data3
/dev/md125 44T 32T 10T 76% /data
tmpfs 6.3G 16K 6.3G 1% /run/user/0
/dev/sdk1 22T 28K 21T 1% /tmp/24T_1
/dev/sdl1 22T 28K 21T 1% /tmp/24T_2
root@5600x:~# ls -lFa /tmp/24T_?
/tmp/24T_1:
total 32
drwxr-xr-x 3 root root 4096 Oct 27 00:54 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:51 ../
drwx------ 2 root root 16384 Oct 27 00:27 lost+found/
/tmp/24T_2:
total 32
drwxr-xr-x 3 root root 4096 Oct 27 00:54 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:51 ../
drwx------ 2 root root 16384 Oct 27 00:29 lost+found/
root@5600x:~# mkdir /tmp/24T_1/data /tmp/24T_2/data
root@5600x:~# ls -lFa /tmp/24T_?
/tmp/24T_1:
total 36
drwxr-xr-x 4 root root 4096 Oct 27 00:55 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:55 ../
drwxr-xr-x 2 root root 4096 Oct 27 00:55 data/
drwx------ 2 root root 16384 Oct 27 00:27 lost+found/
/tmp/24T_2:
total 36
drwxr-xr-x 4 root root 4096 Oct 27 00:55 ./
drwxrwxrwt 13 root root 12288 Oct 27 00:55 ../
drwxr-xr-x 2 root root 4096 Oct 27 00:55 data/
drwx------ 2 root root 16384 Oct 27 00:29 lost+found/
root@5600x:~# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 6.3G 4.8M 6.3G 1% /run
efivarfs 128K 58K 66K 47% /sys/firmware/efi/efivars
/dev/sdf2 457G 17G 417G 4% /
tmpfs 32G 1.1M 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sdf1 1.1G 6.2M 1.1G 1% /boot/efi
/dev/sde1 938G 807G 84G 91% /home
/dev/md127 1.8T 1.4T 406G 77% /data2
/dev/nvme1n1p1 3.6T 1.9T 1.6T 55% /data3
/dev/md125 44T 32T 10T 76% /data
tmpfs 6.3G 16K 6.3G 1% /run/user/0
/dev/sdk1 22T 28K 21T 1% /tmp/24T_1
/dev/sdl1 22T 28K 21T 1% /tmp/24T_2
root@5600x:~# nohup cp -r /data/export /tmp/24T_1/data &
[1] 3718
root@5600x:~# nohup: ignoring input and appending output to 'nohup.out'
root@5600x:~# nohup cp -r /data/neural /tmp/24T_2/data &
[2] 3732
root@5600x:~# nohup: ignoring input and appending output to 'nohup.out'
root@5600x:~# date
Mon Oct 27 12:58:42 AM MDT 2025
< Go do something else for a while >
root@5600x:~# date
Mon Oct 27 02:30:38 AM MDT 2025
root@5600x:~# parted -l 2>&1 | grep Disk | grep -v Flags | sort
Disk /dev/md125: 48.0TB
Disk /dev/md127: 2000GB
Disk /dev/nvme0n1: 1024GB
Disk /dev/nvme1n1: 4001GB
Disk /dev/sda: 8002GB
Disk /dev/sdb: 8002GB
Disk /dev/sdc: 8002GB
Disk /dev/sdd: 8002GB
Disk /dev/sde: 1024GB
Disk /dev/sdf: 500GB
Disk /dev/sdg: 1000GB
Disk /dev/sdh: 1000GB
Disk /dev/sdi: 1000GB
Disk /dev/sdj: 1000GB
Disk /dev/sdm: 8002GB
Disk /dev/sdn: 8002GB
Disk /dev/sdo: 8002GB
Disk /dev/sdp: 8002GB
Disk /dev/sdq: 24.0TB
Disk /dev/sdr: 24.0TB
nohup.out is full of errors saying the targets of the copy commands are invalid, cannot be written.
To explain my command line "parted -l 2>&1 | grep Disk | grep -v Flags | sort" to list the disks: parted prints error messages for the 8TB disk mdraid members because I built it using raw dissks not partitions: mdraid does not know what to make of them and it prints errors about unknown partition type, which gets confusing, thus I redirected stderr to /dev/null. Grep Disk and grep out Flags merely to get a list of disk vs device file without the manufacturere or model info that parted prints.
To explain the disks:
The eight 8002GB disks are the members of /dev/md125, an 8 disk RAID6 mdraid.
These 8 disks are all attached to the motherboard SATA ports.
The NVMe are just storing somewhat hot, read active, data.
The four 1000GB disks are SATA SSDs, members of /dev/md127, a 4 disk RAID 10 mdraid.
Disk sde1 is mounted on /home.
Disk sdf1 is mounted on /.
The two wandering 24TB disks start out at /dev/sdk and /dev/sdl and later show up at /dev/sdq and /dev/sdr.
These 8 disks are attached to an LSI 9500-8i HBA.
I have never seen anything like this before, and I am at a loss.
How can a disk have it's device file assignment changed?
How can I prevent this happening!?
Any wisdom would be much appreciated.
Last edited:

