1. Multipath Topology (read from multipath -l output)
The 4-number notation of paths separated by colons is host (i.e. HBA) number, channel (always 0 in our shop since we always use single channel HBAs), SCSI target which represents switch in our case, and LUN.
[root@dcdrpcora9 ~]# multipath -l asm_vol2 (36005076801870036a000000000000d57) dm-3 IBM,2145 size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 3:0:0:0 sdc 8:32 active undef running | `- 2:0:1:0 sde 8:64 active undef running `-+- policy='round-robin 0' prio=0 status=enabled |- 2:0:0:0 sda 8:0 active undef running `- 3:0:1:0 sdg 8:96 active undef running asm_vol1 (36005076801870036a000000000000d58) dm-2 IBM,2145 size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 2:0:0:1 sdb 8:16 active undef running | `- 3:0:1:1 sdh 8:112 active undef running `-+- policy='round-robin 0' prio=0 status=enabled |- 3:0:0:1 sdd 8:48 active undef running `- 2:0:1:1 sdf 8:80 active undef runningThe following diagram graphically represents the topology given by the multipath -l shown above. Red means HBA card (as in /sys/class/fc_host/host? or /sys/class/scsi_host/host?). Since we only use fibre channel HBAs to make device mapper multipaths, only HBAs 2 and 3 are shown; HBAs 0 and 1 are not fibre channel cards. Channel numbers are ignored; they are all 0. The two switches are shown in purple, each providing 4 paths going to the two storage LUNs in green, two paths coming from HBA2 and two from HBA3.
HBA2 HBA3 <-- FC host or HBA / \ / \ / V \ / / \ \ [ SW0 ] [ SW1 ] <-- SCSI target or our switch /\ /\ /\ /\ 0 1 0 1 0 1 0 1 <-- LUN sda b c d e f g h <-- path / V V V \ / / \ / \ / \ \ / / \/ \/ \ \ / / /\ /\ \ \ / / / \ / \ \ \ ------------- V ------------- | LUN0 |/ \| LUN1 | <-- LUN | asm_vol2 | | asm_vol1 | ------------- ------------- sda,sdc,sde,sdg sdb,sdd,sdf,sdh <-- pathTake the first path in the multipath -l output for an example, path 3:0:0:0 sdc. It originates from HBA3, going through channel 0 (not shown in the diagram), switch 0, ending at LUN0, which is asm_vol2. Look at the first path for asm_vol1 in the output, 2:0:0:1 sdb. It starts at HBA2, goes to channel 0 (not shown), switch 0, and ends at LUN1, i.e. asm_vol1.
2. Script to check multipath failures
#!/usr/bin/perl -w #ck_multipaths.pl: Check active multipaths, alert if less than 4 paths (Yong 2013,2014) #assume mapper device named like ^asm; if not, adjust regexp pattern as needed $RECIPIENT='you@example.com,yourbuddy@example.com'; $LOGFILE='/root/ck_multipaths.log'; $LOGFILEHIST='/root/ck_multipaths.hist'; #accumulated history $HOSTNAME=qx(/bin/hostname -s); @mps = split /\n/, qx(/sbin/multipath -l); sub process_mp { print "$mp has $cnt active paths.\n"; #path count of last, not this, mp in the loop if ($mp=~/^asm/ and $cnt<4) { $TM=qx(/bin/date "+%Y%m%d %H:%M"); chomp $TM; open LOG, ">>$LOGFILE" or die "Can't open $LOGFILE for write: $!"; print LOG "$TM: $mp has $cnt active paths!\n"; close LOG; } } system "/bin/cat $LOGFILE >> $LOGFILEHIST"; truncate "$LOGFILE", 0; foreach(@mps) { if (/dm-/) #mp (multipath) header line { &process_mp if (defined $mp and defined $cnt); $cnt = 0; $mp = $_; #to be used for next line read } else { $cnt++ if /\d+:\d+ +\[?active/; #line pattern: "...major:minor active..." or "... [active" } } #the "finally" block &process_mp if (defined $mp and defined $cnt); system "/bin/mail -s \"Alert from $HOSTNAME\" $RECIPIENT < $LOGFILE" if -s $LOGFILE;
Yong Huang 2013,2014
My comments on multipath.conf settings
path_grouping_policy: When it's set to multibus for active/active devices, all paths are in 1 group, just like a hard disk has only C partition, easier to manage.
getuid_callout: Manually run the script to make sure it fetches wwid correctly.
features: Make very sure not to set queue_if_no_path to 1 for Oracle RAC; either set it to 0 or don't set features.
path_checker: Setting it to tur is for active/passive only.
failback: Must be immediate for fast failover
rr_min_io: Smaller value (than default 1000) may be better for OLTP? Note it's not rr_min_io requests, but that multipled by the priority value of requests, that must be done before switching path.
no_path_retry: Must be set to fail for Oracle RAC, according to numerous Oracle and Red Hat articles. Make sure it's not overridden in the more specific section below, such as devices{}.
Our case
Sep 03 2014 at 04:51 PM -04:00
Our test shows that with no_path_retry set to fail, features commented out (no need to set it to "0 queue_if_no_path"), and a few other parameters probably not very relevant (polling_interval=10, path_selector="round-robin 0", path_checker=readsector0, rr_min_io=100), we no longer get "multipathd blocked for xxx seconds" message and the server stays up.
Another case
Server I/O wait is high (shown in %wa of top or %iowait of sar), Oracle frequently stalls, and /var/log/messages has lines like
May 21 13:35:27 myhost kernel: qla2xxx [0000:0b:00.0]-801c:2: Abort command issued nexus=2:0:5 -- 1 2002. May 21 13:35:27 myhost kernel: qla2xxx [0000:0b:00.0]-801c:2: Abort command issued nexus=2:1:2 -- 1 2002.The root cause is later found to be a bad Cisco core switch. But temporarily disabling the faulted paths in multipath devices is a workaround. The key is to identify the faulted path device. According to an HP article, the numbers after nexus indicate the SCSI target, which in our case, are the path devices highlighted below in the multipath -l output
# multipath -l ... asm_vol5 (36005076801870036a0000000000010a8) dm-11 IBM,2145 size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 2:0:0:5 sdf 8:80 active undef unknown | `- 3:0:0:5 sdr 65:16 active undef unknown `-+- policy='round-robin 0' prio=0 status=enabled |- 2:0:1:5 sdl 8:176 active undef unknown `- 3:0:1:5 sdx 65:112 active undef unknown ... asm_vol3 (36005076801870036a000000000000ef2) dm-6 IBM,2145 size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 2:0:1:2 sdi 8:128 active undef unknown | `- 3:0:1:2 sdu 65:64 active undef unknown `-+- policy='round-robin 0' prio=0 status=enabled |- 2:0:0:2 sdc 8:32 active undef unknown `- 3:0:0:2 sdo 8:224 active undef unknown ...The second number 0 here (single channel) can be omitted when matching the nexus numbers. To stop the frequent I/O hang, we can delete the corresponding SD devices that are the path devices of the multipath devices.
# echo 1 > /sys/block/sdf/device/delete # echo 1 > /sys/block/sdi/device/deleteAfter a while, the faulted path devices will be gone from the multipath devices and system I/O wait comes down from 40% to very low and Oracle runs normally.
It's important to find all faulted devices, with a command like
# grep nexus /var/log/messages* | awk '{print $11}' | sort | uniq -c | sort -n 1601 nexus=2:0:1 <-- corresponds to 2:0:0:1 in `multipath -l' output 1677 nexus=2:0:5 1941 nexus=2:1:2 2146 nexus=2:1:4 2158 nexus=2:1:0 2248 nexus=2:0:3 <-- the one with the most faultsTypically, each multipath device will have one path device failing.
2018-05
References
Multipath Configuration Defaults
Documentation
FAQ
After you made changes to multipath settings, reload the map (multipath -r) and the multipathd service (service multipathd reload), and check
# multipathd -k multipathd> show config defaults { verbosity 2 polling_interval 10 udev_dir "/dev" multipath_dir "/lib64/multipath" path_selector "round-robin 0" path_grouping_policy multibus getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n" prio alua features "0" ...You can also use this one-line command to do it: echo "show config" | multipathd -k
Some very preliminary notes:
login as: oracle myhost ~ $ cd /sys/class/fc_remote_ports myhost fc_remote_ports $ sudo multipath -l > /tmp/multipath.out [sudo] password for oracle: myhost fc_remote_ports $ head /tmp/multipath.out #see what the output looks like ASM_DATA37_CPB (36005076801870036a000000000000e69) dm-242 IBM,2145 size=250G features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=0 status=active | |- 0:0:7:16 sdoo 129:320 active undef running | `- 1:0:7:16 sduc 66:576 active undef running `-+- policy='round-robin 0' prio=0 status=enabled |- 0:0:5:16 sdmg 69:384 active undef running `- 1:0:6:16 sdmb 69:304 active undef running ASM_DATA22_CPB (36005076801870036a000000000000e5a) dm-155 IBM,2145 size=250G features='0' hwhandler='0' wp=rw myhost fc_remote_ports $ grep -- '- [0-9]:0' /tmp/multipath.out | cut -c6-10 | sort | uniq -c #assume single digit host 77 0:0:1 <-- this host-target combination is used 77 times to form LUNs 77 0:0:2 1 0:0:4 <-- this combination is only used once 74 0:0:5 1 0:0:6 74 0:0:7 77 1:0:0 77 1:0:1 1 1:0:4 1 1:0:5 74 1:0:6 74 1:0:7 myhost fc_remote_ports $ ls rport-0:0-0 rport-0:0-10 rport-0:0-2 rport-0:0-4 rport-0:0-9 rport-1:0-1 rport-1:0-11 rport-1:0-3 rport-1:0-8 rport-0:0-1 rport-0:0-11 rport-0:0-3 rport-0:0-8 rport-1:0-0 rport-1:0-10 rport-1:0-2 rport-1:0-4 rport-1:0-9 myhost fc_remote_ports $ ls rport-0:0-0 device fast_io_fail_tmo node_name port_name power scsi_target_id supported_classes dev_loss_tmo maxframe_size port_id port_state roles subsystem uevent myhost fc_remote_ports $ for i in */scsi_target_id; do echo -n "$i: "; cat $i; done rport-0:0-0/scsi_target_id: -1 <-- not a real fibre channel target rport-0:0-10/scsi_target_id: 6 rport-0:0-11/scsi_target_id: 7 rport-0:0-1/scsi_target_id: 0 rport-0:0-2/scsi_target_id: 1 rport-0:0-3/scsi_target_id: 2 rport-0:0-4/scsi_target_id: 3 rport-0:0-8/scsi_target_id: 4 rport-0:0-9/scsi_target_id: 5 rport-1:0-0/scsi_target_id: -1 <-- same here rport-1:0-10/scsi_target_id: 6 rport-1:0-11/scsi_target_id: 7 rport-1:0-1/scsi_target_id: 0 rport-1:0-2/scsi_target_id: 1 rport-1:0-3/scsi_target_id: 2 rport-1:0-4/scsi_target_id: 3 rport-1:0-8/scsi_target_id: 4 rport-1:0-9/scsi_target_id: 5 myhost fc_remote_ports $ for i in */roles; do echo -n "$i: "; cat $i; done rport-0:0-0/roles: Directory Server rport-0:0-10/roles: FCP Target, FCP Initiator rport-0:0-11/roles: FCP Target, FCP Initiator rport-0:0-1/roles: FCP Target, FCP Initiator rport-0:0-2/roles: FCP Target, FCP Initiator rport-0:0-3/roles: FCP Target, FCP Initiator rport-0:0-4/roles: FCP Target, FCP Initiator rport-0:0-8/roles: FCP Target, FCP Initiator rport-0:0-9/roles: FCP Target, FCP Initiator rport-1:0-0/roles: Directory Server rport-1:0-10/roles: FCP Target, FCP Initiator rport-1:0-11/roles: FCP Target, FCP Initiator rport-1:0-1/roles: FCP Target, FCP Initiator rport-1:0-2/roles: FCP Target, FCP Initiator rport-1:0-3/roles: FCP Target, FCP Initiator rport-1:0-4/roles: FCP Target, FCP Initiator rport-1:0-8/roles: FCP Target, FCP Initiator rport-1:0-9/roles: FCP Target, FCP Initiator myhost fc_remote_ports $ for i in */supported_classes; do echo -n "$i: "; cat $i; done rport-0:0-0/supported_classes: unspecified rport-0:0-10/supported_classes: Class 3 rport-0:0-11/supported_classes: Class 3 rport-0:0-1/supported_classes: Class 3 rport-0:0-2/supported_classes: Class 3 rport-0:0-3/supported_classes: Class 3 rport-0:0-4/supported_classes: Class 3 rport-0:0-8/supported_classes: Class 3 rport-0:0-9/supported_classes: Class 3 rport-1:0-0/supported_classes: unspecified rport-1:0-10/supported_classes: Class 3 rport-1:0-11/supported_classes: Class 3 rport-1:0-1/supported_classes: Class 3 rport-1:0-2/supported_classes: Class 3 rport-1:0-3/supported_classes: Class 3 rport-1:0-4/supported_classes: Class 3 rport-1:0-8/supported_classes: Class 3 rport-1:0-9/supported_classes: Class 3 myhost fc_remote_ports $ grep tmo /etc/multipath.conf #fast_io_fail_tmo 5 myhost fc_remote_ports $ for i in */dev_loss_tmo; do echo -n "$i: "; cat $i; done #default 30 seconds rport-0:0-0/dev_loss_tmo: 30 rport-0:0-10/dev_loss_tmo: 30 rport-0:0-11/dev_loss_tmo: 30 rport-0:0-1/dev_loss_tmo: 30 rport-0:0-2/dev_loss_tmo: 30 rport-0:0-3/dev_loss_tmo: 30 rport-0:0-4/dev_loss_tmo: 30 rport-0:0-8/dev_loss_tmo: 30 rport-0:0-9/dev_loss_tmo: 30 rport-1:0-0/dev_loss_tmo: 30 rport-1:0-10/dev_loss_tmo: 30 rport-1:0-11/dev_loss_tmo: 30 rport-1:0-1/dev_loss_tmo: 30 rport-1:0-2/dev_loss_tmo: 30 rport-1:0-3/dev_loss_tmo: 30 rport-1:0-4/dev_loss_tmo: 30 rport-1:0-8/dev_loss_tmo: 30 rport-1:0-9/dev_loss_tmo: 30 myhost fc_remote_ports $ for i in */fast_io_fail_tmo; do echo -n "$i: "; cat $i; done #default? rport-0:0-0/fast_io_fail_tmo: off rport-0:0-10/fast_io_fail_tmo: 5 rport-0:0-11/fast_io_fail_tmo: 5 rport-0:0-1/fast_io_fail_tmo: off rport-0:0-2/fast_io_fail_tmo: 5 rport-0:0-3/fast_io_fail_tmo: 5 rport-0:0-4/fast_io_fail_tmo: off rport-0:0-8/fast_io_fail_tmo: 5 rport-0:0-9/fast_io_fail_tmo: 5 rport-1:0-0/fast_io_fail_tmo: off rport-1:0-10/fast_io_fail_tmo: 5 rport-1:0-11/fast_io_fail_tmo: 5 rport-1:0-1/fast_io_fail_tmo: 5 rport-1:0-2/fast_io_fail_tmo: 5 rport-1:0-3/fast_io_fail_tmo: off rport-1:0-4/fast_io_fail_tmo: off rport-1:0-8/fast_io_fail_tmo: 5 rport-1:0-9/fast_io_fail_tmo: 5
To my Computer Page