Multipath on ubuntu
Posted by Lebo77@reddit | linuxadmin | View on Reddit | 8 comments
So I got some remanufactured SAS drives to put in my 12-bay disk shelf. The way it's set up there are two SAS cables from the HBA in my server to the two expanders/controllers in the shelf. To manage splitting I/O between these two paths I am useing the multipath tools package.
I have 10 disks in there now and it works great. All the disks show up in /dev/mapper/mpath...
These new disks however do not. I still see them when I do an LSBLK (two copies of each disk), and running smartcmd shoes me identical serial numbers for both. The issue is multipath seems to not be finding them.
So, any ideas where I should start debugging this?
Intergalactic_Ass@reddit
First off: nothing particularly special about Ubuntu. multipathd and device-mapper handle this and they're not unique to Debian or Red Hat.
If your new disks are not showing up with a mpath device, have you looked at the /etc/multipath/bindings file?
What's your multipath.conf look like? Any chatter in multipath -v3 about these serials specifically? multipath -v3 -c against the block devices in question?
Lebo77@reddit (OP)
So multipath.conf is super basic:
defaults {
user_friendly_names yes
path_grouping_policy multibus
}
Again, the other 10 drives are working fine.
multipath -v3 -c /dev/sdd gives:
10590.156325 | set open fds limit to 1048576/1048576
10590.156349 | loading //lib/multipath/libchecktur.so checker
10590.156456 | checker tur: message table size = 3
10590.156469 | loading //lib/multipath/libprioconst.so prioritizer
10590.156575 | _init_foreign: foreign library "nvme" is not enabled
10590.156920 | sdd: size = 23437770752
10590.157229 | unloading tur checker
10590.157259 | unloading const prioritizer
Note, these devices are NOT nvme. They are regular spinning rust SAS drives.
Going all the way to multipath -v5 /dev/sdv (the other duplicate) gives:
10760.138279 | set open fds limit to 1048576/1048576
10760.138307 | loading //lib/multipath/libchecktur.so checker
10760.138415 | checker tur: message table size = 3
10760.138429 | loading //lib/multipath/libprioconst.so prioritizer
10760.138547 | _init_foreign: found
libforeign-nvme.so
10760.138558 | _init_foreign: foreign library "nvme" is not enabled
10760.138577 | sdv: dev not found in pathvec
10760.138864 | sdv: mask = 0x31
10760.138872 | sdv: dev_t = 65:80
10760.138878 | open '/sys/devices/pci0000:00/0000:00:01.2/0000:02:00.2/0000:03:00.0/0000:04:00.0/host0/port-0:1/expander-0:1/port-0:1:10/end_device-0:1:10/target0:0:22/0:0:22:0/block/sdv/size'
10760.138904 | sdv: size = 23437770752
10760.139181 | sdv: can't store path info
10760.139188 | /dev/sdv: failed to get wwid
10760.139192 | scope is null
10760.139230 | unloading tur checker
10760.139258 | unloading const prioritizer
The "can't store path info" and "failed to get wwid" seem like major red flags, but I am not sure what to do about them.
I tried this with /dev/sdd and it gave identical output.
(P.S. Thank you for your help. )(
Ok_Jump6953@reddit
Hi, Ubuntu maintainer for multipath-tools here. I'm curious what version of Ubuntu are you using? Does multipath create the bindings in /etc/multipath/bindings?
That `failed to get wwid` definitely seems alarming and isn't something I have seen yet. Are you able to list the WWID for each disk?
Try:
$ sudo lsscsi --scis_id
Any alarming errors with multipath in dmesg?
$ sudo dmesg | grep multipath
Lebo77@reddit (OP)
Ubuntu version: Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-122-generic x86_64)
lsscsi --scis_id and lsscsi --scis_id /dev/sdd just give me:
unrecognized option '--scis_id'
but
/lib/udev/scsi_id --page=0x83 -g -u --whitelisted --device=/dev/sdd
gives me:
35000c500dad70e57
and /lib/udev/scsi_id --page=0x83 -g -u --whitelisted --device=/dev/sdv
gives me
35000c500dad70e57
They are clearly the same disk, with a real, matching WWID.
sudo dmesg | grep multipath returns:
[ 13.902425] systemd[1]: Listening on multipathd control socket.
[ 14.073718] device-mapper: multipath service-time: version 0.3.0 loaded
P.S.: Thank you for working on multipath. I have been using it successfully for a year to run a 10-drive zfs array on this same disk shelf and it's been flawless up to this point. I am sure if I had not cheaped out and gone with renewed disks this would not be a problem. I suspect it was something they did to the drive's BIOS in the process that is messing this up.
Ok_Jump6953@reddit
whoops sorry, typo writing commands on my phone, I got the flag wrong.
$ sudo lsscsi --scsi_id
But that's just to retrieve the WWID which you already got, so no need to re-run.
Does this drive happen to be a Seagate factory recertified 'white label' drive? Perhaps this is the same issue as https://github.com/opensvc/multipath-tools/issues/56
Lebo77@reddit (OP)
We have a winner!
You are absolutely an open-source rock star.
I was wondering if the lack of a vendor name was part of the problem. Thanks you for all the help and have a fantastic day.
Ok_Jump6953@reddit
Glad to hear the patch worked out :)
Lebo77@reddit (OP)
I am not at home, but I think you nailed it. I will give it a shot tomorrow and let you know. Thank you so much.