Multipath

1. Multipath Topology (read from multipath -l output)

The 4-number notation of paths separated by colons is host (i.e. HBA) number, channel (always 0 in our shop since we always use single channel HBAs), SCSI target which represents switch in our case, and LUN.

[root@dcdrpcora9 ~]# multipath -l
asm_vol2 (36005076801870036a000000000000d57) dm-3 IBM,2145
size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 3:0:0:0 sdc        8:32  active undef running
| `- 2:0:1:0 sde        8:64  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 2:0:0:0 sda        8:0   active undef running
  `- 3:0:1:0 sdg        8:96  active undef running
asm_vol1 (36005076801870036a000000000000d58) dm-2 IBM,2145
size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 2:0:0:1 sdb        8:16  active undef running
| `- 3:0:1:1 sdh        8:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 3:0:0:1 sdd        8:48  active undef running
  `- 2:0:1:1 sdf        8:80  active undef running
The following diagram graphically represents the topology given by the multipath -l shown above. Red means HBA card (as in /sys/class/fc_host/host? or /sys/class/scsi_host/host?). Since we only use fibre channel HBAs to make device mapper multipaths, only HBAs 2 and 3 are shown; HBAs 0 and 1 are not fibre channel cards. Channel numbers are ignored; they are all 0. The two switches are shown in purple, each providing 4 paths going to the two storage LUNs in green, two paths coming from HBA2 and two from HBA3.
               HBA2 HBA3       <-- FC host or HBA
               /  \ /  \
              /    V    \
             /    / \    \
           [ SW0 ]   [ SW1 ]   <-- SCSI target or our switch
           /\   /\   /\   /\
          0  1 0  1 0  1 0  1  <-- LUN
        sda  b c  d e  f g  h  <-- path
         /    V    V    V    \
        /    / \  / \  / \    \
       /    /   \/   \/   \    \
      /    /    /\   /\    \    \
     /    /    /  \ /  \    \    \
     ------------- V -------------
     |   LUN0    |/ \|   LUN1    |   <-- LUN
     |  asm_vol2 |   |  asm_vol1 |
     -------------   -------------
    sda,sdc,sde,sdg  sdb,sdd,sdf,sdh <-- path
Take the first path in the multipath -l output for an example, path 3:0:0:0 sdc. It originates from HBA3, going through channel 0 (not shown in the diagram), switch 0, ending at LUN0, which is asm_vol2. Look at the first path for asm_vol1 in the output, 2:0:0:1 sdb. It starts at HBA2, goes to channel 0 (not shown), switch 0, and ends at LUN1, i.e. asm_vol1.

2. Script to check multipath failures

#!/usr/bin/perl -w
#ck_multipaths.pl: Check active multipaths, alert if less than 4 paths (Yong 2013,2014)
#assume mapper device named like ^asm; if not, adjust regexp pattern as needed

$RECIPIENT='you@example.com,yourbuddy@example.com';
$LOGFILE='/root/ck_multipaths.log';
$LOGFILEHIST='/root/ck_multipaths.hist'; #accumulated history

$HOSTNAME=qx(/bin/hostname -s);

@mps = split /\n/, qx(/sbin/multipath -l);

sub process_mp
{ print "$mp has $cnt active paths.\n"; #path count of last, not this, mp in the loop
  if ($mp=~/^asm/ and $cnt<4)
  { $TM=qx(/bin/date "+%Y%m%d %H:%M"); chomp $TM;
    open LOG, ">>$LOGFILE" or die "Can't open $LOGFILE for write: $!";
    print LOG "$TM: $mp has $cnt active paths!\n";
    close LOG;
  }
}

system "/bin/cat $LOGFILE >> $LOGFILEHIST";
truncate "$LOGFILE", 0;
foreach(@mps)
{ if (/dm-/) #mp (multipath) header line
  { &process_mp if (defined $mp and defined $cnt);
    $cnt = 0;
    $mp = $_; #to be used for next line read
  }
  else
  { $cnt++ if /\d+:\d+ +\[?active/; #line pattern: "...major:minor active..." or "... [active"
  }
}

#the "finally" block
&process_mp if (defined $mp and defined $cnt);

system "/bin/mail -s \"Alert from $HOSTNAME\" $RECIPIENT < $LOGFILE" if -s $LOGFILE;

Yong Huang 2013,2014



My comments on multipath.conf settings
path_grouping_policy: When it's set to multibus for active/active devices, all paths are in 1 group, just like a hard disk has only C partition, easier to manage.
getuid_callout: Manually run the script to make sure it fetches wwid correctly.
features: Make very sure not to set queue_if_no_path to 1 for Oracle RAC; either set it to 0 or don't set features.
path_checker: Setting it to tur is for active/passive only.
failback: Must be immediate for fast failover
rr_min_io: Smaller value (than default 1000) may be better for OLTP? Note it's not rr_min_io requests, but that multipled by the priority value of requests, that must be done before switching path.
no_path_retry: Must be set to fail for Oracle RAC, according to numerous Oracle and Red Hat articles. Make sure it's not overridden in the more specific section below, such as devices{}.


Our case
Sep 03 2014 at 04:51 PM -04:00
Our test shows that with no_path_retry set to fail, features commented out (no need to set it to "0 queue_if_no_path"), and a few other parameters probably not very relevant (polling_interval=10, path_selector="round-robin 0", path_checker=readsector0, rr_min_io=100), we no longer get "multipathd blocked for xxx seconds" message and the server stays up.


References
Multipath Configuration Defaults
Documentation
FAQ

After you made changes to multipath settings, reload the map (multipath -r) and the multipathd service (service multipathd reload), and check

# multipathd -k
multipathd> show config
defaults {
        verbosity 2
        polling_interval 10
        udev_dir "/dev"
        multipath_dir "/lib64/multipath"
        path_selector "round-robin 0"
        path_grouping_policy multibus
        getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
        prio alua
        features "0"
...
You can also use this one-line command to do it: echo "show config" | multipathd -k

To my Computer Page