Kill the NFS CLIENT processes
jibbajabba
Member Posts: 4,317 ■■■■■■■■□□
Which processes need to be killed to kill the NFS CLIENT (not server) ?
Some say NFS, some portmap, some nfslock but neither would release a frozen share.
In my case for example it was so bad that I couldn't even 'lazy' unmount the share as the session froze as soon as I typed the share name. On another server I was able to copy / paste the command which worked, but that server had a stupid java remote card which didn't allow copy / paste.
The only workaround here was removing the share from fstab and reboot the server but is there another way to kill the client ?!
Some say NFS, some portmap, some nfslock but neither would release a frozen share.
In my case for example it was so bad that I couldn't even 'lazy' unmount the share as the session froze as soon as I typed the share name. On another server I was able to copy / paste the command which worked, but that server had a stupid java remote card which didn't allow copy / paste.
The only workaround here was removing the share from fstab and reboot the server but is there another way to kill the client ?!
My own knowledge base made public: http://open902.com
Comments
-
senghor Member Posts: 38 ■■□□□□□□□□Hi Gomjaba,
how are you mounting the NFS? hard,intr I hope....
You can't kill the client until you kill -9 the processes that are using (or trying to use) the share..
you can see the processes with:
fuser /your/share
if it doesn't, there i a work around that I use....let me know. -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Hi Gomjaba,
how are you mounting the NFS? hard,intr I hope....
You can't kill the client until you kill -9 the processes that are using (or trying to use) the share..
you can see the processes with:
fuser /your/share
if it doesn't, there i a work around that I use....let me know.
in fstab I got192.168.0.29:/shares/mike /backup nfs defaults 0 0
Which is then simply mounted withmount /backup
[root@mike-1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 65G 4.4G 57G 8% / /dev/sdb1 902G 39G 817G 5% /home tmpfs 2.0G 0 2.0G 0% /dev/shm 192.168.0.29:/shares/mike 197G 41G 157G 21% /backup
fuser /backup would not show anything in this case as nothing is actually using the share until my backup script kicks in.
What happens now is, if say the NAS is completely down, every process trying to access this share hangs.
Whether it is 'df' or other. Even yum freezes completely as it will also check for disk space which again, you can't.
When you try to 'umount -l' the share it starts to freeze the ssh session as soon as you type /backup so the only way is copy / paste which does work, not on a remote card though which can't copy / paste.
Does this make sense ? So what I am trying to achieve is unmounting the share if I can't copy / paste the umount command.My own knowledge base made public: http://open902.com -
Forsaken_GA Member Posts: 4,024I'll be honest, if I can't force unmount it, I usually just reboot the box in question to get the share freed up. More often than not, the process that's trying to use it simply will not kill -9 and it's faster for me to kick the box over and fix the fallout from doing so than to leave it in that state and keep trying to find ways to kill it
-
jibbajabba Member Posts: 4,317 ■■■■■■■■□□That is what I do when all fails. But the box in question had also a couple of DRBD drives worth a few TB with an uptime of around 700 days. Which was already impressive but we knew it would force a filesystem check upon reboot which would have taken weeks probably. As I also didn't have access to the Internet l couldn't google how to disable the force fsck lol - basically everything which could have gone wrong - didMy own knowledge base made public: http://open902.com
-
senghor Member Posts: 38 ■■□□□□□□□□Gomjaba,
I see that in your fstab you are not handling failures of server/network....that is why your clients hang....I think
you can handle in two ways soft (the magic recipe for corrupted data) and hard (the way to holiness)
try this# device mountpoint fs-type options **** fsckord your.share:/home /backup nfs rw,hard,intr 0 0
you can't kill the process (kind of) if you don't specify intr.
Try with a test system and use iptables to simulate the disconnection of the NFS Server.
one question though....what happens when the share crashes?....meaning...why the share is down?...is it a network issue, NFS Server down or busy?....can you reach the NFS from this client via network? ICMP, SSH,....
I'm asking because there are ways to "trick" the client when there seems to be no connection on to the NFS server. -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Gomjaba,
I see that in your fstab you are not handling failures of server/network....that is why your clients hang....I think
you can handle in two ways soft (the magic recipe for corrupted data) and hard (the way to holiness)
try this# device mountpoint fs-type options **** fsckord your.share:/home /backup nfs rw,hard,intr 0 0
you can't kill the process (kind of) if you don't specify intr.
Try with a test system and use iptables to simulate the disconnection of the NFS Server.
one question though....what happens when the share crashes?....meaning...why the share is down?...is it a network issue, NFS Server down or busy?....can you reach the NFS from this client via network? ICMP, SSH,....
I'm asking because there are ways to "trick" the client when there seems to be no connection on to the NFS server.
In this case the NFS Server was down. The storage sub system had the swine flue and managed to wipe the OS from the boot LUN (at least the LVM LUN was still there).
Cheers for those options .. gonna have a play with itMy own knowledge base made public: http://open902.com -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Just read this :
Mounting an NFS Volume
intr really seems to be the way to go .. Cheers for the pointerMy own knowledge base made public: http://open902.com -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Heh nice one senghor
Now added INTR as well ..
df obviously still hangs but does allow cancelling (CTRL-C) and even more importantly allows to unmount the share without crashing the whole console / ssh session ...
Cheers ..My own knowledge base made public: http://open902.com -
senghor Member Posts: 38 ■■□□□□□□□□Heh nice one senghor
Now added INTR as well ..
df obviously still hangs but does allow cancelling (CTRL-C) and even more importantly allows to unmount the share without crashing the whole console / ssh session ...
Cheers ..
Cool! -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Nah, I just shutdown the interface with the VLan used solely for the backup shareMy own knowledge base made public: http://open902.com
-
senghor Member Posts: 38 ■■□□□□□□□□Nah, I just shutdown the interface with the VLan used solely for the backup share
Nice -
Forsaken_GA Member Posts: 4,024That is what I do when all fails. But the box in question had also a couple of DRBD drives worth a few TB with an uptime of around 700 days. Which was already impressive but we knew it would force a filesystem check upon reboot which would have taken weeks probably. As I also didn't have access to the Internet l couldn't google how to disable the force fsck lol - basically everything which could have gone wrong - did
Ah, that would do it.
When I was dealing with stubborn NFS mounts, it usually went something like this -
Customer: 'Hey, none of our videos are playing, is something wrong with the server?'
(long story short, front end web servers mounting NFS shares to server up videos from a central location so that the content didn't have to be replicated among many servers)
Me: 'Hrm, it looks like the NFS mount is stale, I'll see what I can do'
At this point, I would have about 10 minutes to fix the problem before the customer is on the phone to the president of the company, who would then be riding my ass and reminding how important a customer this is, so on and so forth.
If I can't get the share restored in that time frame, it's time to emply shotgun diagnostics and reboot the sucker. Telling the customer that the server was hung and needed to be rebooted is something they understand - they're windows users, after all -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□IPTables are for wussies lolMy own knowledge base made public: http://open902.com
-
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Lol Forsaken, reboot is a cool solution, forced file system check of 20TB data drives isn't lol.
Can this actually be turned off on the fly ?My own knowledge base made public: http://open902.com -
senghor Member Posts: 38 ■■□□□□□□□□
Can this actually be turned off on the fly ?
alright...
this is the definition in /etc/fstab# device mount_point FS_type options ****_freq fsck_order
fsck_order can be set to:
0=ignore
1=first
2-9=second-third...
so...Mr Watson, how do you disable fsck of a certain volume before the reboot? -
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Are you sure it is still a valid setting in the new kernels because the server in question has already the setting at 0 0 and run a fsck on boot ....My own knowledge base made public: http://open902.com
-
jibbajabba Member Posts: 4,317 ■■■■■■■■□□Ah
[root@test ~]# tune2fs -l /dev/mapper/test <SNIP> Filesystem created: Wed Dec 2 16:23:05 2009 Last mount time: Sat Apr 24 01:17:36 2010 Last write time: Sat Apr 24 01:17:36 2010 Mount count: 3 Maximum mount count: 33 [B]Last checked: Wed Dec 2 16:23:05 2009 Check interval: 15552000 (6 months) Next check after: Mon May 31 17:23:05 2010[/B] Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) <SNIP> [root@test ~]# tune2fs -i 0 /dev/mapper/test tune2fs 1.39 (29-May-2006) Setting interval between checks to 0 seconds [root@test ~]# tune2fs -l /dev/mapper/test <SNIP> Filesystem created: Wed Dec 2 16:23:05 2009 Last mount time: Sat Apr 24 01:17:36 2010 Last write time: Sun Apr 25 12:17:14 2010 Mount count: 3 Maximum mount count: 33 [B]Last checked: Wed Dec 2 16:23:05 2009 Check interval: 0 (<none>)[/B] Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) <SNIP>
Guess that'll do ...My own knowledge base made public: http://open902.com -
senghor Member Posts: 38 ■■□□□□□□□□Ah
[root@test ~]# tune2fs -l /dev/mapper/test <SNIP> Filesystem created: Wed Dec 2 16:23:05 2009 Last mount time: Sat Apr 24 01:17:36 2010 Last write time: Sat Apr 24 01:17:36 2010 Mount count: 3 Maximum mount count: 33 [B]Last checked: Wed Dec 2 16:23:05 2009 Check interval: 15552000 (6 months) Next check after: Mon May 31 17:23:05 2010[/B] Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) <SNIP> [root@test ~]# tune2fs -i 0 /dev/mapper/test tune2fs 1.39 (29-May-2006) Setting interval between checks to 0 seconds [root@test ~]# tune2fs -l /dev/mapper/test <SNIP> Filesystem created: Wed Dec 2 16:23:05 2009 Last mount time: Sat Apr 24 01:17:36 2010 Last write time: Sun Apr 25 12:17:14 2010 Mount count: 3 Maximum mount count: 33 [B]Last checked: Wed Dec 2 16:23:05 2009 Check interval: 0 (<none>)[/B] Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) <SNIP>
Guess that'll do ...
-
Forsaken_GA Member Posts: 4,024Lol Forsaken, reboot is a cool solution, forced file system check of 20TB data drives isn't lol.
No, I understand. The forced fsck on a very big filesystem would be a severely limiting factor in your case hehe
God, I don't even like rebooting servers with 5TB of storage, it takes reiser forever to mount them -
senghor Member Posts: 38 ■■□□□□□□□□One thing to add Gojamba.
even with an fsck set to 0, if the kernel detects a corruption...as minimal as it could be...it will set fsck to 1. so...it will happen!
just to let yo know