Thursday, August 16, 2012

How to remove an FC LUN from a running RedHat 6 server.

This quick howto document shows how to remove a fibre channel LUN under multipathd(8) control from a running RedHat Enterprise Linux 6 machine. Be careful when performing online storage modifications. Make sure you have a valid backup. And of course I can't be held resonsible for any problems if you follow these steps ;)

In our example, we have an LVM volume mounted on /export/oracle which is under multipathd(8) control. We will remove this volume from the server without taking the machine down.

So first, make sure the mount point is not used anymore. Check your applications and users and remove all references to this device.

If the volume is mounted, check if it's used and if not, then unmount it. The fuser(1) and lsof(1) commands can tell you if the device is in use. Don't forget that if this file system is shared via NFS, you will need to stop the NFS daemons before you can umount(1) it.

df -h /export/oracle

/dev/mapper/ora-bckp  2.0T  1.4T  509G  74% /export/oracle


sudo fuser /export/oracle

sudo umount /export/oracle

Now unmount the file system.

sudo umount /export/oracle

From the df(1) command above, we saw that the /export/oracle file system is in fact an LVM logical volume called « bckp » from the volume group « ora ». Let's take a look at the LVM configuration for both of these objects starting with the logical volume.


sudo lvs bckp
  LV   VG   Attr      LSize Pool Origin Data%  Move Log Cpy%Sync Convert
  bckp ora  -wi-a---- 2.00t                                            

Then the volume group.

sudo vgs ora
  VG   #PV #LV #SN Attr   VSize VFree
  ora    4   1   0 wz--n- 2.00t    0 

And finally, the physical devices.


sudo pvs | egrep 'PV|ora'

  PV                   VG   Fmt  Attr PSize   PFree
  /dev/mapper/backup01 ora  lvm2 a--  512.00g    0 
  /dev/mapper/backup02 ora  lvm2 a--  512.00g    0 
  /dev/mapper/backup03 ora  lvm2 a--  512.00g    0 
  /dev/mapper/backup04 ora  lvm2 a--  512.00g    0 


Take a note of these four LVM physical devices. We will use this info later. But for now, we must first remove the logical volume and then the volume group from LVM. We start by removing the logical volume.


sudo lvremove ora/bckp
Do you really want to remove active logical volume bckp? [y/n]: y
  Logical volume "bckp" successfully removed

Then we remove the volume group.



sudo vgremove ora
  Volume group "ora" successfully removed

We can now work on the LVM physical devices.



sudo pvremove /dev/mapper/backup01
  Labels on physical volume "/dev/mapper/backup01" successfully wiped
sudo pvremove /dev/mapper/backup02 /dev/mapper/backup03 /dev/mapper/backup04


  Labels on physical volume "/dev/mapper/backup02" successfully wiped
  Labels on physical volume "/dev/mapper/backup03" successfully wiped
  Labels on physical volume "/dev/mapper/backup04" successfully wiped


Good, now let's check the multipath status for these four LVM physical devices.

sudo multipath -ll
[...output truncated...]

backup04 (3600508b4000c1ec00001400000b30000) dm-2 HP,HSV300
size=512G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:0:4 sdd  8:48   active ready running
| `- 3:0:3:4 sdt  65:48  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 2:0:3:4 sdj  8:144  active ready running
  `- 3:0:0:4 sdn  8:208  active ready running
backup03 (3600508b4000c1ec00001400000a60000) dm-3 HP,HSV300
size=512G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:3:3 sdi  8:128  active ready running
| `- 3:0:0:3 sdm  8:192  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 2:0:0:3 sdc  8:32   active ready running
  `- 3:0:3:3 sds  65:32  active ready running
backup02 (3600508b4000c1ec00001400000980000) dm-1 HP,HSV300
size=512G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:0:2 sdb  8:16   active ready running
| `- 3:0:3:2 sdr  65:16  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 2:0:3:2 sdh  8:112  active ready running
  `- 3:0:0:2 sdl  8:176  active ready running
backup01 (3600508b4000c1ec00001400000840000) dm-0 HP,HSV300
size=512G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 2:0:0:1 sda  8:0    active ready running
| `- 3:0:3:1 sdq  65:0   active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 2:0:3:1 sdg  8:96   active ready running
  `- 3:0:0:1 sdk  8:160  active ready running


Record the bold output as we need it later. A quick way to do so is like this :

for i in backup01 backup02 backup03 backup04; do
      sudo multipath -ll $i | grep ':' | sed -e "s/.- //g" -e "s/^| //g" -e "s/  //g" | cut -d' ' -f1 | tee -a /tmp/ids
done

Now remove the LUNs from multipathd(8) control.

sudo multipath -f backup01 backup02 backup03 backup04

Once that's done, make sure they're not listed in the following output.

sudo multipath -ll | grep backup

Update /etc/multipath.conf to remove the LUN. In this example, I removed this block of code from the file's multipaths section. YMMV of course, because the LUN's WWN will obviously not be the same.

sudo vim /etc/multipath.conf

<remove>

        multipath {
                wwid "3600508b4000c1ec00001400000840000"
                alias backup01
        }
        multipath {
                wwid "3600508b4000c1ec00001400000980000"
                alias backup02
        }
        multipath {
                wwid "3600508b4000c1ec00001400000a60000"
                alias backup03
        }
        multipath {
                wwid "3600508b4000c1ec00001400000b30000"
                alias backup04
        }
</remove>

Tell multipathd(8) that the configuration has changed.

sudo /etc/init.d/multipahtd reload

Clear the device from the SCSI subsystem. This is where we need the recorded output from above. What we need is the HBA number:Channel:Target ID:LUN number numbers. These numbers look like 2:0:1:3 in the `multipath -ll` output. Since we previously saved our SCSI IDs in the /tmp/ids file, we can simply do this :

sudo su - root
cat /tmp/ids | while read id; do
   echo "1" > /sys/class/scsi_device/${id}/device/delete
done

This will generate logs similar to these ones in /var/log/messages :

Aug 16 13:19:52 oxygen multipathd: sdw: remove path (uevent)

Now that we have safely removed the LUNs from the server, we can remove those LUNs from the storage array. Once you do this, the server from which we just removed a LUNs will complain in it's /var/log/messages :

Aug 16 13:48:59 oxygen kernel: sd 5:0:0:1: [sdc] Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.

These are warning messages only and can be safely ignored. To be complete, we should really issue a LIP from each of the HBA ports on the server. If you don't know how many HBA ports you have, just look into the /sys/class/fc_host directory. There is going to be one sub-directory per HBA port. In this example, the machine has two single ports HBA, so we have two sub-directories.

ls /sys/class/fc_host/
host2  host3

To issue a LIP reset, simple do this.

sudo su - root
ls /sys/class/fc_host/ | while read dir
     do echo $dir; echo 1 > /sys/class/fc_host/${dir}/issue_lip
done

And that's it!

Should you want to read more about online storage management under RedHat 6, then read the Red Hat Enterprise Linux 6 Storage Administration Guide « Deploying and configuring single-node storage in Red Hat Enterprise Linux 6 »

HTH,

DA+

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.