Back to main site | Back to man page index

LVMRAID(7)                                                                                                 LVMRAID(7)



NAME
       lvmraid — LVM RAID


DESCRIPTION
       LVM RAID is a way to create logical volumes (LVs) that use multiple physical devices to improve performance or
       tolerate device failure.  How blocks of data in an LV are placed onto physical devices is  determined  by  the
       RAID  level.   RAID  levels  are  commonly  referred  to by number, e.g. raid1, raid5.  Selecting a RAID level
       involves tradeoffs among physical device requirements, fault tolerance, and performance.  A description of the
       RAID levels can be found at
       www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf

       LVM  RAID  uses both Device Mapper (DM) and Multiple Device (MD) drivers from the Linux kernel.  DM is used to
       create and manage visible LVM devices, and MD is used to place data on physical devices.


Create a RAID LV
       To create a RAID LV, use lvcreate and specify an LV type.  The LV type corresponds to a RAID level.  The basic
       RAID levels that can be used are: raid0, raid1, raid4, raid5, raid6, raid10.

       lvcreate --type RaidLevel [OPTIONS] --name Name --size Size VG [PVs]

       To display the LV type of an existing LV, run:

       lvs -o name,segtype VG/LV

       (The LV type is also referred to as "segment type" or "segtype".)

       LVs can be created with the following types:


   raid0


       Also  called striping, raid0 spreads LV data across multiple devices in units of stripe size.  This is used to
       increase performance.  LV data will be lost if any of the devices fail.

       lvcreate --type raid0 [--stripes Number --stripesize Size] VG [PVs]


       --stripes specifies the number of devices to spread the LV across.


       --stripesize specifies the size of each stripe in kilobytes.  This is the amount of data that  is  written  to
              one device before moving to the next.

       PVs specifies the devices to use.  If not specified, lvm will choose Number devices, one for each stripe.


   raid1


       Also called mirroring, raid1 uses multiple devices to duplicate LV data.  The LV data remains available if all
       but one of the devices fail.  The minimum number of devices required is 2.

       lvcreate --type raid1 [--mirrors Number] VG [PVs]
       available if one device fails.  The parity is used to recalculate data that is lost from a single device.  The
       minimum number of devices required is 3.

       lvcreate --type raid4 [--stripes Number --stripesize Size] VG [PVs]


       --stripes specifies the number of devices to use for LV data.  This does not include the extra device lvm adds
              for storing parity blocks.  Number stripes requires Number+1 devices.  Number must be 2 or more.


       --stripesize specifies the size of each stripe in kilobytes.  This is the amount of data that  is  written  to
              one device before moving to the next.

       PVs specifies the devices to use.  If not specified, lvm will choose Number+1 separate devices.

       raid4 is called non-rotating parity because the parity blocks are always stored on the same device.


   raid5


       raid5  is  a  form of striping that uses an extra device for storing parity blocks.  LV data and parity blocks
       are stored on each device.  The LV data remains available if one device fails.  The parity is used to recalcu‐
       late data that is lost from a single device.  The minimum number of devices required is 3.

       lvcreate --type raid5 [--stripes Number --stripesize Size] VG [PVs]


       --stripes specifies the number of devices to use for LV data.  This does not include the extra device lvm adds
              for storing parity blocks.  Number stripes requires Number+1 devices.  Number must be 2 or more.


       --stripesize specifies the size of each stripe in kilobytes.  This is the amount of data that  is  written  to
              one device before moving to the next.

       PVs specifies the devices to use.  If not specified, lvm will choose Number+1 separate devices.

       raid5  is  called  rotating  parity because the parity blocks are placed on different devices in a round-robin
       sequence.  There are variations of raid5 with different algorithms for placing the parity blocks.  The default
       variant  is  raid5_ls (raid5 left symmetric, which is a rotating parity 0 with data restart.)  See RAID5 vari‐
       ants below.


   raid6


       raid6 is a form of striping like raid5, but uses two extra devices for parity  blocks.   LV  data  and  parity
       blocks  are  stored  on  each device.  The LV data remains available if up to two devices fail.  The parity is
       used to recalculate data that is lost from one or two devices.  The minimum number of devices required is 5.

       lvcreate --type raid6 [--stripes Number --stripesize Size] VG [PVs]


       --stripes specifies the number of devices to use for LV data.  This does not include the extra two devices lvm

   raid10


       raid10 is a combination of raid1 and raid0, striping data across mirrored devices.  LV data remains  available
       if one or more devices remains in each mirror set.  The minimum number of devices required is 4.

       lvcreate --type raid10
              [--mirrors NumberMirrors]
              [--stripes NumberStripes --stripesize Size]
              VG [PVs]


       --mirrors  specifies  the  number  of mirror images within each stripe.  e.g.  --mirrors 1 means there are two
              images of the data, the original and one mirror image.


       --stripes specifies the total number of devices to use in all raid1 images (not the number of raid1 devices to
              spread  the  LV across, even though that is the effective result).  The number of devices in each raid1
              mirror will be NumberStripes/(NumberMirrors+1), e.g. mirrors 1 and stripes 4 will  stripe  data  across
              two raid1 mirrors, where each mirror is devices.


       --stripesize  specifies  the  size of each stripe in kilobytes.  This is the amount of data that is written to
              one device before moving to the next.

       PVs specifies the devices to use.  If not specified, lvm will choose the necessary devices.  Devices are  used
       to  create mirrors in the order listed, e.g. for mirrors 1, stripes 2, listing PV1 PV2 PV3 PV4 results in mir‐
       rors PV1/PV2 and PV3/PV4.

       RAID10 is not mirroring on top of stripes, which would be RAID01, which is less tolerant of device failures.



Synchronization
       Synchronization makes all the devices in a RAID LV consistent with each other.

       In a RAID1 LV, all mirror images should have the same data.  When a new mirror image is  added,  or  a  mirror
       image  is missing data, then images need to be synchronized.  Data blocks are copied from an existing image to
       a new or outdated image to make them match.

       In a RAID 4/5/6 LV, parity blocks and data blocks should match based on  the  parity  calculation.   When  the
       devices  in  a RAID LV change, the data and parity blocks can become inconsistent and need to be synchronized.
       Correct blocks are read, parity is calculated, and recalculated blocks are written.

       The RAID implementation keeps track of which parts of a RAID LV are synchronized.  This uses a bitmap saved in
       the  RAID metadata.  The bitmap can exclude large parts of the LV from synchronization to reduce the amount of
       work.  Without this, the entire LV would need to be synchronized every time it was activated.  When a RAID  LV
       is first created and activated the first synchronization is called initialization.

       Automatic  synchronization  happens when a RAID LV is activated, but it is usually partial because the bitmaps
       reduce the areas that are checked.  A full sync may become necessary when devices in the RAID LV are changed.

       The synchronization status of a RAID LV is reported by the following command, where "image synced" means  sync
       of problems may be undetected by automatic synchronization which excludes areas outside  of  the  RAID  write-
       intent bitmap.

       The command to scrub a RAID LV can operate in two different modes:

       lvchange --syncaction check|repair VG/LV


       check Check mode is read-only and only detects inconsistent areas in the RAID LV, it does not correct them.


       repair Repair mode checks and writes corrected blocks to synchronize any inconsistent areas.


       Scrubbing  can  consume  a  lot of bandwidth and slow down application I/O on the RAID LV.  To control the I/O
       rate used for scrubbing, use:


       --maxrecoveryrate Rate[b|B|s|S|k|K|m|M|g|G]
              Sets the maximum recovery rate for a RAID LV.  Rate is specified as  an  amount  per  second  for  each
              device in the array.  If no suffix is given, then KiB/sec/device is assumed.  Setting the recovery rate
              to 0 means it will be unbounded.


       --minrecoveryrate Rate[b|B|s|S|k|K|m|M|g|G]
              Sets the minimum recovery rate for a RAID LV.  Rate is specified as  an  amount  per  second  for  each
              device in the array.  If no suffix is given, then KiB/sec/device is assumed.  Setting the recovery rate
              to 0 means it will be unbounded.


       To display the current scrubbing in progress on an LV, including the syncaction  mode  and  percent  complete,
       run:

       lvs -a -o name,raid_sync_action,sync_percent

       After scrubbing is complete, to display the number of inconsistent blocks found, run:

       lvs -o name,raid_mismatch_count

       Also, if mismatches were found, the lvs attr field will display the letter "m" (mismatch) in the 9th position,
       e.g.

       # lvs -o name,vgname,segtype,attr vg/lvol0
         LV    VG   Type  Attr
         lvol0 vg   raid1 Rwi-a-r-m-



   Scrubbing Limitations
       The check mode can only report the number of inconsistent blocks, it cannot report which blocks are  inconsis‐
       tent.   This  makes  it  impossible to know which device has errors, or if the errors affect file system data,
       metadata or nothing at all.

       The repair mode can make the RAID LV data consistent, but it does not know which data is correct.  The  result


SubLVs
       An  LV  is  often a combination of other hidden LVs called SubLVs.  The SubLVs either use physical devices, or
       are built from other SubLVs themselves.  SubLVs hold LV data blocks, RAID parity blocks,  and  RAID  metadata.
       SubLVs are generally hidden, so the lvs -a option is required display them:

       lvs -a -o name,segtype,devices

       SubLV names begin with the visible LV name, and have an automatic suffix indicating its role:


       ·  SubLVs  holding LV data or parity blocks have the suffix _rimage_#.  These SubLVs are sometimes referred to
          as DataLVs.


       ·  SubLVs holding RAID metadata have the suffix _rmeta_#.  RAID metadata includes superblock information, RAID
          type, bitmap, and device health information.  These SubLVs are sometimes referred to as MetaLVs.


       SubLVs are an internal implementation detail of LVM.  The way they are used, constructed and named may change.

       The following examples show the SubLV arrangement for each of the basic RAID LV types, using the fewest number
       of devices allowed for each.


   Examples
       raid0
       Each rimage SubLV holds a portion of LV data.  No parity is used.  No RAID metadata is used.

       lvcreate --type raid0 --stripes 2 --name lvr0 ...

       lvs -a -o name,segtype,devices
         lvr0            raid0  lvr0_rimage_0(0),lvr0_rimage_1(0)
         [lvr0_rimage_0] linear /dev/sda(...)
         [lvr0_rimage_1] linear /dev/sdb(...)

       raid1
       Each rimage SubLV holds a complete copy of LV data.  No parity is used.  Each rmeta SubLV holds RAID metadata.

       lvcreate --type raid1 --mirrors 1 --name lvr1 ...

       lvs -a -o name,segtype,devices
         lvr1            raid1  lvr1_rimage_0(0),lvr1_rimage_1(0)
         [lvr1_rimage_0] linear /dev/sda(...)
         [lvr1_rimage_1] linear /dev/sdb(...)
         [lvr1_rmeta_0]  linear /dev/sda(...)
         [lvr1_rmeta_1]  linear /dev/sdb(...)

       raid4
       Two rimage SubLVs each hold a portion of LV data and one rimage SubLV holds parity.  Each  rmeta  SubLV  holds
       RAID metadata.

       lvcreate --type raid4 --stripes 2 --name lvr4 ...
       raid5
       Three rimage SubLVs each hold a portion of LV data and parity.  Each rmeta SubLV holds RAID metadata.

       lvcreate --type raid5 --stripes 2 --name lvr5 ...

       lvs -a -o name,segtype,devices
         lvr5            raid5  lvr5_rimage_0(0),\
                                lvr5_rimage_1(0),\
                                lvr5_rimage_2(0)
         [lvr5_rimage_0] linear /dev/sda(...)
         [lvr5_rimage_1] linear /dev/sdb(...)
         [lvr5_rimage_2] linear /dev/sdc(...)
         [lvr5_rmeta_0]  linear /dev/sda(...)
         [lvr5_rmeta_1]  linear /dev/sdb(...)
         [lvr5_rmeta_2]  linear /dev/sdc(...)

       raid6
       Six rimage SubLVs each hold a portion of LV data and parity.  Each rmeta SubLV holds RAID metadata.

       lvcreate --type raid6 --stripes 3 --name lvr6

       lvs -a -o name,segtype,devices
         lvr6            raid6  lvr6_rimage_0(0),\
                                lvr6_rimage_1(0),\
                                lvr6_rimage_2(0),\
                                lvr6_rimage_3(0),\
                                lvr6_rimage_4(0),\
                                lvr6_rimage_5(0)
         [lvr6_rimage_0] linear /dev/sda(...)
         [lvr6_rimage_1] linear /dev/sdb(...)
         [lvr6_rimage_2] linear /dev/sdc(...)
         [lvr6_rimage_3] linear /dev/sdd(...)
         [lvr6_rimage_4] linear /dev/sde(...)
         [lvr6_rimage_5] linear /dev/sdf(...)
         [lvr6_rmeta_0]  linear /dev/sda(...)
         [lvr6_rmeta_1]  linear /dev/sdb(...)
         [lvr6_rmeta_2]  linear /dev/sdc(...)
         [lvr6_rmeta_3]  linear /dev/sdd(...)
         [lvr6_rmeta_4]  linear /dev/sde(...)
         [lvr6_rmeta_5]  linear /dev/sdf(...)

       raid10
       Four rimage SubLVs each hold a portion of LV data.  No parity is used.
       Each rmeta SubLV holds RAID metadata.

       lvcreate --type raid10 --stripes 2 --mirrors 1 --name lvr10

       lvs -a -o name,segtype,devices
         lvr10            raid10 lvr10_rimage_0(0),\
                                 lvr10_rimage_1(0),\
                                 lvr10_rimage_2(0),\
                                 lvr10_rimage_3(0)
         [lvr10_rimage_0] linear /dev/sda(...)
         [lvr10_rimage_1] linear /dev/sdb(...)

       operating  in  a degraded mode, without losing LV data, even after a device fails.  The number of devices that
       can fail without the loss of LV data depends on the RAID level:


       ·  RAID0 (striped) LVs cannot tolerate losing any devices.  LV data will be lost if any devices fail.


       ·  RAID1 LVs can tolerate losing all but one device without LV data loss.


       ·  RAID4 and RAID5 LVs can tolerate losing one device without LV data loss.


       ·  RAID6 LVs can tolerate losing two devices without LV data loss.


       ·  RAID10 is variable, and depends on which devices are lost.  It can tolerate losing all but one device in  a
          single raid1 mirror without LV data loss.


       If  a  RAID LV is missing devices, or has other device-related problems, lvs reports this in the health_status
       (and attr) fields:

       lvs -o name,lv_health_status

       partial
       Devices are missing from the LV.  This is also indicated by the letter "p" (partial) in the  9th  position  of
       the lvs attr field.

       refresh needed
       A  device  was  temporarily  missing  but  has returned.  The LV needs to be refreshed to use the device again
       (which will usually require partial synchronization).  This is also  indicated  by  the  letter  "r"  (refresh
       needed)  in the 9th position of the lvs attr field.  See Refreshing an LV.  This could also indicate a problem
       with the device, in which case it should be be replaced, see Replacing Devices.

       mismatches exist
       See Scrubbing.

       Most commands will also print a warning if a device is missing, e.g.
       WARNING: Device for PV uItL3Z-wBME-DQy0-... not found or rejected ...

       This warning will go away if the device returns or is removed from the VG (see vgreduce --removemissing).



   Activating an LV with missing devices
       A RAID LV that is missing devices may be activated  or  not,  depending  on  the  "activation  mode"  used  in
       lvchange:

       lvchange -ay --activationmode {complete|degraded|partial} VG/LV

       complete
       The LV is only activated if all devices are present.

       lvmconfig --type default activation/activation_mode


   Replacing Devices
       Devices  in a RAID LV can be replaced with other devices in the VG.  When replacing devices that are no longer
       visible on the system, use lvconvert --repair.  When replacing devices that are still visible,  use  lvconvert
       --replace.  The repair command will attempt to restore the same number of data LVs that were previously in the
       LV.  The replace option can be repeated to replace multiple PVs.  Replacement devices can be optionally listed
       with either option.

       lvconvert --repair VG/LV [NewPVs]

       lvconvert --replace OldPV VG/LV [NewPV]

       lvconvert --replace OldPV1 --replace OldPV2 VG/LV [NewPVs]

       New devices require synchronization with existing devices, see Synchronization.


   Refreshing an LV
       Refreshing  a  RAID  LV clears any transient device failures (device was temporarily disconnected) and returns
       the LV to its fully redundant mode.  Restoring a device will usually require at least partial  synchronization
       (see Synchronization).  Failure to clear a transient failure results in the RAID LV operating in degraded mode
       until it is reactivated.  Use the lvchange command to refresh an LV:

       lvchange --refresh VG/LV

       # lvs -o name,vgname,segtype,attr,size vg
         LV    VG   Type  Attr       LSize
         raid1 vg   raid1 Rwi-a-r-r- 100.00g

       # lvchange --refresh vg/raid1

       # lvs -o name,vgname,segtype,attr,size vg
         LV    VG   Type  Attr       LSize
         raid1 vg   raid1 Rwi-a-r--- 100.00g


   Automatic repair
       If a device in a RAID LV fails, device-mapper in the kernel notifies the dmeventd(8) monitoring  process  (see
       Monitoring).  dmeventd can be configured to automatically respond using:

       lvm.conf(5) activation/raid_fault_policy

       Possible settings are:

       warn
       A  warning  is  added to the system log indicating that a device has failed in the RAID LV.  It is left to the
       user to repair the LV, e.g.  replace failed devices.

       allocate
       dmeventd automatically attempts to repair the LV using spare devices in the VG.  Note that  even  a  transient
       failure is handled as a permanent failure; a new device is allocated and full synchronization is started.

       If specific PVs in a RAID LV are known to have corrupt data, the data on those PVs can be reconstructed with:

       lvchange --rebuild PV VG/LV

       The rebuild option can be repeated with different PVs to replace the data on multiple PVs.



Monitoring
       When a RAID LV is activated the dmeventd(8) process is started to monitor  the  health  of  the  LV.   Various
       events  detected  in  the  kernel  can  cause  a  notification to be sent from device-mapper to the monitoring
       process, including device failures and synchronization completion (e.g.  for initialization or scrubbing).

       The LVM configuration file contains options that affect how the monitoring process  will  respond  to  failure
       events  (e.g.  raid_fault_policy).   It is possible to turn on and off monitoring with lvchange, but it is not
       recommended to turn this off unless you have a thorough knowledge of the consequences.



Configuration Options
       There are a number of options in the LVM configuration file that affect the behavior of RAID LVs.  The tunable
       options are listed below.  A detailed description of each can be found in the LVM configuration file itself.
               mirror_segtype_default
               raid10_segtype_default
               raid_region_size
               raid_fault_policy
               activation_mode



RAID1 Tuning
       A  RAID1  LV  can be tuned so that certain devices are avoided for reading while all devices are still written
       to.

       lvchange --[raid]writemostly PhysicalVolume[:{y|n|t}] VG/LV

       The specified device will be marked as "write mostly", which means that  reading  from  this  device  will  be
       avoided,  and other devices will be preferred for reading (unless no other devices are available.)  This mini‐
       mizes the I/O to the specified device.

       If the PV name has no suffix, the write mostly attribute is set.  If the PV name has the suffix :n, the  write
       mostly attribute is cleared, and the suffix :t toggles the current setting.

       The write mostly option can be repeated on the command line to change multiple devices at once.

       To  report  the  current write mostly setting, the lvs attr field will show the letter "w" in the 9th position
       when write mostly is set:

       lvs -a -o name,attr

       When a device is marked write mostly, the maximum number of outstanding writes to that device can  be  config‐
       ured.   Once  the  maximum is reached, further writes become synchronous.  When synchronous, a write to the LV
       will not complete until writes to all the mirror images are complete.

       level is usually done to increase or decrease resilience to device failures.  This is done using lvconvert and
       specifying the new RAID level as the LV type:

       lvconvert --type RaidLevel VG/LV [PVs]

       The most common and recommended RAID takeover conversions are:


       linear to raid1
              Linear is a single image of LV data, and converting it to raid1 adds a mirror image which is  a  direct
              copy of the original linear image.


       striped/raid0 to raid4/5/6
              Adding parity devices to a striped volume results in raid4/5/6.


       Unnatural conversions that are not recommended include converting between striped and non-striped types.  This
       is because file systems often optimize I/O patterns based on device striping values.  If those values  change,
       it can decrease performance.

       Converting to a higher RAID level requires allocating new SubLVs to hold RAID metadata, and new SubLVs to hold
       parity blocks for LV data.  Converting to a lower RAID level removes the SubLVs that are no longer needed.

       Conversion often requires full synchronization of the RAID LV  (see  Synchronization).   Converting  to  RAID1
       requires  copying  all  LV  data  blocks  to  a  new image on a new device.  Converting to a parity RAID level
       requires reading all LV data blocks, calculating parity, and writing the new parity  blocks.   Synchronization
       can take a long time and degrade performance (rate controls also apply to conversion, see --maxrecoveryrate.)


       The following takeover conversions are currently possible:

       ·  between linear and raid1.

       ·  between striped and raid4.


   Examples
       1. Converting an LV from linear to raid1.

       # lvs -a -o name,segtype,size vg
         LV   Type   LSize
         lv   linear 300.00g

       # lvconvert --type raid1 --mirrors 1 vg/lv

       # lvs -a -o name,segtype,size vg
         LV            Type   LSize
         lv            raid1  300.00g
         [lv_rimage_0] linear 300.00g
         [lv_rimage_1] linear 300.00g
         [lv_rmeta_0]  linear   3.00m
         [lv_rmeta_1]  linear   3.00m

         LV            Type   LSize
         lv            raid1  100.00g
         [lv_rimage_0] linear 100.00g
         [lv_rimage_1] linear 100.00g
         [lv_rmeta_0]  linear   3.00m
         [lv_rmeta_1]  linear   3.00m

       3. Converting an LV from linear to raid1 (with 3 images).

       Start with a linear LV:

       # lvcreate -L1G -n my_lv vg

       Convert the linear LV to raid1 with three images
       (original linear image plus 2 mirror images):

       # lvconvert --type raid1 --mirrors 2 vg/my_lv




RAID Reshaping
       RAID reshaping is changing attributes of a RAID LV while keeping the same RAID level, i.e. changes that do not
       involve changing the number of devices.  This includes  changing  RAID  layout,  stripe  size,  or  number  of
       stripes.

       When  changing  the  RAID  layout or stripe size, no new SubLVs (MetaLVs or DataLVs) need to be allocated, but
       DataLVs are extended by a small amount (typically 1 extent).  The extra space allows blocks in a stripe to  be
       updated safely, and not corrupted in case of a crash.  If a crash occurs, reshaping can just be restarted.

       (If  blocks  in  a  stripe  were  updated  in place, a crash could leave them partially updated and corrupted.
       Instead, an existing stripe is quiesced, read, changed in layout, and the new stripe written  to  free  space.
       Once that is done, the new stripe is unquiesced and used.)

       (The reshaping features are planned for a future release.)



RAID5 Variants
       raid5_ls
       · RAID5 left symmetric
       · Rotating parity N with data restart

       raid5_la
       · RAID5 left symmetric
       · Rotating parity N with data continuation

       raid5_rs
       · RAID5 right symmetric
       · Rotating parity 0 with data restart

       raid5_ra
       · RAID5 right asymmetric
       · Rotating parity 0 with data continuation

       raid6_nr
       · RAID6 N restart (aka right symmetric)
       · Rotating parity N with data restart

       raid6_nc
       · RAID6 N continue
       · Rotating parity N with data continuation





History
       The  2.6.38-rc1  version  of the Linux kernel introduced a device-mapper target to interface with the software
       RAID (MD) personalities.  This provided device-mapper with RAID 4/5/6 capabilities and  a  larger  development
       community.  Later, support for RAID1, RAID10, and RAID1E (RAID 10 variants) were added.  Support for these new
       kernel RAID targets was added to LVM version 2.02.87.  The capabilities of the LVM raid1 type  have  surpassed
       the  old  mirror type.  raid1 is now recommended instead of mirror.  raid1 became the default for mirroring in
       LVM version 2.02.100.




Red Hat, Inc                           LVM TOOLS 2.02.166(2)-RHEL7 (2016-11-16)                            LVMRAID(7)