HA nfs server with drbd, zfs and heartbeat on CentOS 6.6 [FIX]

Following this how-to to set up a ha nfs server I stumbled into a problem with zfs and the takeover process of heartbeat.
When forcing the active machine to standby via:

/usr/share/heartbeat/hb_standby

the takeover failed with:

...
ResourceManager(default)[3439]:	2015/07/13_16:22:25 info: Releasing resource group: distfs1-test IPaddr::172.16.10.10/24/eth0 drbddisk::data zfs nfs
ResourceManager(default)[3439]:	2015/07/13_16:22:25 info: Running /etc/init.d/nfs  stop
ResourceManager(default)[3439]:	2015/07/13_16:22:26 info: Running /etc/init.d/zfs  stop
ResourceManager(default)[3439]:	2015/07/13_16:22:26 info: Running /etc/ha.d/resource.d/drbddisk data stop
ResourceManager(default)[3439]:	2015/07/13_16:22:28 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
ResourceManager(default)[3439]:	2015/07/13_16:22:29 info: Retrying failed stop operation [drbddisk::data]
ResourceManager(default)[3439]:	2015/07/13_16:22:29 info: Running /etc/ha.d/resource.d/drbddisk data stop
ResourceManager(default)[3439]:	2015/07/13_16:22:31 ERROR: Return code 1 from /etc/ha.d/resource.d/drbddisk
...

Further tests produced a bit more detailed error:

[root@distfs1-test ~]# /etc/ha.d/resource.d/drbddisk data stop
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup-84 secondary 1' terminated with exit code 11
1: State change failed: (-12) Device is held open by someone
Command 'drbdsetup-84 secondary 1' terminated with exit code 11

Investigating further showed that even after an:

/etc/init.d/zfs stop

There was still a zfs process running:

[root@distfs1-test ~]# ps xa| grep zfs
 4021 ?        S<     0:00 [zfs_iput_taskq/]

After removing the kernel module by hand:

[root@distfs1-test ~]# rmmod zfs
[root@distfs1-test ~]# /etc/ha.d/resource.d/drbddisk data stop
[root@distfs1-test ~]# echo $?
0

everything worked fine.

I ended up fixing the zfs Sys-V init script simply by applying this patch:

112a113,115
> 	# Unload kernel module. A zfs_iput_taskq will still be running, even
> 	# after calling /etc/init.d/zfs stop. Removing the kernel module resolves this issue.
> 	/sbin/rmmod zfs
Dieser Beitrag wurde unter /dev/administration veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.