HA nfs server with drbd, zfs and heartbeat on CentOS 6.6 [FIX]

Following this how-to to set up a ha nfs server I stumbled into a problem with zfs and the takeover process of heartbeat.
When forcing the active machine to standby via:
[bash]
/usr/share/heartbeat/hb_standby
[/bash]
the takeover failed with:
[bash]

ResourceManager(default)[3439]: 2015/07/13_16:22:25 info: Releasing resource group: distfs1-test IPaddr::172.16.10.10/24/eth0 drbddisk::data zfs nfs
ResourceManager(default)[3439]: 2015/07/13_16:22:25 info: Running /etc/init.d/nfs stop
ResourceManager(default)[3439]: 2015/07/13_16:22:26 info: Running /etc/init.d/zfs stop
ResourceManager(default)[3439]: 2015/07/13_16:22:26 info: Running /etc/ha.d/resource.d/drbddisk data stop
ResourceManager(default)[3439]: 2015/07/13_16:22:28 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
ResourceManager(default)[3439]: 2015/07/13_16:22:29 info: Retrying failed stop operation [drbddisk::data]
ResourceManager(default)[3439]: 2015/07/13_16:22:29 info: Running /etc/ha.d/resource.d/drbddisk data stop
ResourceManager(default)[3439]: 2015/07/13_16:22:31 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk

[/bash]

Further tests produced a bit more detailed error:
[bash]
[root@distfs1-test ~]# /etc/ha.d/resource.d/drbddisk data stop
1: State change failed: (-12) Device is held open by someone
Command ‚drbdsetup-84 secondary 1‘ terminated with exit code 11
1: State change failed: (-12) Device is held open by someone
Command ‚drbdsetup-84 secondary 1‘ terminated with exit code 11
[/bash]

Investigating further showed that even after an:
[bash]
/etc/init.d/zfs stop
[/bash]
There was still a zfs process running:
[bash]
[root@distfs1-test ~]# ps xa| grep zfs
4021 ? S< 0:00 [zfs_iput_taskq/]
[/bash]

After removing the kernel module by hand:
[bash]
[root@distfs1-test ~]# rmmod zfs
[root@distfs1-test ~]# /etc/ha.d/resource.d/drbddisk data stop
[root@distfs1-test ~]# echo $?
0
[/bash]
everything worked fine.

I ended up fixing the zfs Sys-V init script simply by applying this patch:
[bash]
112a113,115
> # Unload kernel module. A zfs_iput_taskq will still be running, even
> # after calling /etc/init.d/zfs stop. Removing the kernel module resolves this issue.
> /sbin/rmmod zfs
[/bash]

Dieser Beitrag wurde unter /dev/administration veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.