How to discover failed disk in HW Raid?

P

postcd

Guest
Hello, please how to discover from linux that the HDD in the HW Raid (1) is failed/need replace?

smartctl did not worked telling me it is not supported on that drive (drive appears like "Virtual" / /dev/mapper/vg-root) ...

i read that one can install raid controller vendor software to monitor it, but one i found (https://centosfreaks.wordpress.com/2012/01/09/monitor-hw-raid-with-dell-openmanage/) which is using http control panel, i need something simple, plain command line as i dont want to mess the server.

Tried to install "lshw" tool (lshw is a small tool to extract detailed information on the hardware configuration of the machine.)

it returns amongst others:
*-pci:3
description: PCI bridge
product: 82801I (ICH9 Family) PCI Express Port 1
vendor: Intel Corporation
physical id: 1c
bus info: pci@0000:00:1c.0
version: 02
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:60 ioport:f000(size=4096) memory:df000000-df1fffff memory:d0000000-d01fffff(prefetchable)
*-scsi
description: SCSI storage controller

product: SAS1068E PCI-Express Fusion-MPT SAS
vendor: LSI Logic / Symbios Logic
physical id: 0
bus info: pci@0000:02:00.0
logical name: scsi0
version: 08
width: 64 bits
clock: 33MHz
capabilities: scsi pm pciexpress msi msix bus_master cap_list rom scsi-host
configuration: driver=mptsas latency=0
resources: irq:16 ioport:fc00(size=256) memory:df1fc000-df1fffff memory:df1e0000-df1effff memory:df000000-df0fffff(prefetchable)
*-disk:0 UNCLAIMED
description: ATA Disk
product: MM1000EBKAF
physical id: 0.0.0
bus info: scsi@0:0.0.0
version: HPG0
serial: 9XG08Y90
capacity: 931GiB (1TB)
capabilities: 15000rpm
configuration: ansiversion=5
*-disk:1 UNCLAIMED
description: ATA Disk
product: MM1000EBKAF
physical id: 0.1.0
bus info: scsi@0:0.1.0
version: HPG0
serial: 9XG09346
capacity: 931GiB (1TB)
capabilities: 15000rpm
configuration: ansiversion=5
*-disk:2
description: SCSI Disk
product: VIRTUAL DISK
vendor: Dell
physical id: 1.0.0
bus info: scsi@0:1.0.0
logical name: /dev/sda
version: 1028
size: 930GiB (999GB)
capacity: 930GiB (999GB)
capabilities: 15000rpm partitioned partitioned:dos
configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=00063e01
*-volume:0
description: Linux filesystem partition
vendor: Linux
physical id: 1
bus info: scsi@0:1.0.0,1
logical name: /dev/sda1
logical name: /boot
version: 1.0
serial: 2ab84f72-7ace-4849-b911-c04fcf526b54
size: 250MiB
capacity: 250MiB
capabilities: primary bootable extended_attributes ext2 initialized
configuration: filesystem=ext2 modified=2014-12-15 13:11:29 mount.fstype=ext2 mount.options=rw,relatime,errors=continue mounted=2014-11-02 22:17:29 state=mounted
*-volume:1
description: Linux LVM Physical Volume partition
physical id: 2
bus info: scsi@0:1.0.0,2
logical name: /dev/sda2
serial: GDGvSj-Pc8R-P4Zz-amVN-q5fe-AVn1-oO3cfG
size: 930GiB
capacity: 930GiB
capabilities: primary multi lvm2

Please any ideas what to run/install so i can exactly tell whether some Raid HDD failed and requires replacement? thx

Update:
When googled "SAS1068E PCI-Express Fusion-MPT SAS" i found this post and they talk about "mpt-status" tool. So i googled it along with my linux distro name and downloaded the version suitable for my linux + installed by: rpm -ivh repourlpath . then it wanted daemonize to install, so i installed it thru yum. then needed to do "modprobe mptctl", now when did "mpt-status -i 0" command it returned:
ioc0 vol_id 0 type IM, 2 phy, 930 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 9 ATA MM1000EBKAF HPG0, 931 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 1 ATA MM1000EBKAF HPG0, 931 GB, state ONLINE, flags NONE
this is not enough to see whether HDD is enligible for replacement by datacenter staff? thx
 
Last edited:


If this is a hardware raid setup (I assume so) then you have to get the raid tool from the Hardware Manufacturer and use it to determine raid status.

If this is software raid (or a hardware utility is not available) you can use the hdparm and dmraid tools.

Normally for HW raid you have to reboot into the raid configuration system to detect bad drives or stick to HW manufacturer software.

Something to look at:
http://wiki.contribs.org/Raid:LSI_Monitoring

And if you search for lsiutils you may find some other software worth looking at.
*Edit: the lsi website is back up...
http://www.lsi.com/products/io-controllers/pages/lsi-sas-1068e.aspx#tab/tab4
 

Members online


Top