Kill the NFS CLIENT processes

Which processes need to be killed to kill the NFS CLIENT (not server) ?

Some say NFS, some portmap, some nfslock but neither would release a frozen share.

In my case for example it was so bad that I couldn't even 'lazy' unmount the share as the session froze as soon as I typed the share name. On another server I was able to copy / paste the command which worked, but that server had a stupid java remote card which didn't allow copy / paste.

The only workaround here was removing the share from fstab and reboot the server but is there another way to kill the client ?!

Find more posts tagged with

Comments

senghor

Hi Gomjaba,

how are you mounting the NFS? hard,intr I hope....

You can't kill the client until you kill -9 the processes that are using (or trying to use) the share..
you can see the processes with:
fuser /your/share

if it doesn't, there i a work around that I use....let me know.

jibbajabba

senghor wrote: »

Hi Gomjaba,

how are you mounting the NFS? hard,intr I hope....

You can't kill the client until you kill -9 the processes that are using (or trying to use) the share..
you can see the processes with:
fuser /your/share

if it doesn't, there i a work around that I use....let me know.

in fstab I got

192.168.0.29:/shares/mike      /backup         nfs     defaults        0 0

Which is then simply mounted with

mount /backup

[root@mike-1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              65G  4.4G   57G   8% /
/dev/sdb1             902G   39G  817G   5% /home
tmpfs                 2.0G     0  2.0G   0% /dev/shm
192.168.0.29:/shares/mike
                      197G   41G  157G  21% /backup

fuser /backup would not show anything in this case as nothing is actually using the share until my backup script kicks in.

What happens now is, if say the NAS is completely down, every process trying to access this share hangs.

Whether it is 'df' or other. Even yum freezes completely as it will also check for disk space which again, you can't.

When you try to 'umount -l' the share it starts to freeze the ssh session as soon as you type /backup so the only way is copy / paste which does work, not on a remote card though which can't copy / paste.

Does this make sense ? So what I am trying to achieve is unmounting the share if I can't copy / paste the umount command.

Forsaken_GA

I'll be honest, if I can't force unmount it, I usually just reboot the box in question to get the share freed up. More often than not, the process that's trying to use it simply will not kill -9 and it's faster for me to kick the box over and fix the fallout from doing so than to leave it in that state and keep trying to find ways to kill it

jibbajabba

That is what I do when all fails. But the box in question had also a couple of DRBD drives worth a few TB with an uptime of around 700 days. Which was already impressive but we knew it would force a filesystem check upon reboot which would have taken weeks probably. As I also didn't have access to the Internet l couldn't google how to disable the force fsck lol - basically everything which could have gone wrong - did

senghor

Gomjaba,

I see that in your fstab you are not handling failures of server/network....that is why your clients hang....I think

you can handle in two ways soft (the magic recipe for corrupted data) and hard (the way to holiness)

try this

# device             mountpoint  fs-type    options    **** fsckord

your.share:/home  /backup   nfs         rw,hard,intr      0     0

you can't kill the process (kind of) if you don't specify intr.

Try with a test system and use iptables to simulate the disconnection of the NFS Server.

one question though....what happens when the share crashes?....meaning...why the share is down?...is it a network issue, NFS Server down or busy?....can you reach the NFS from this client via network? ICMP, SSH,....
I'm asking because there are ways to "trick" the client when there seems to be no connection on to the NFS server.

jibbajabba

senghor wrote: »
Gomjaba,

I see that in your fstab you are not handling failures of server/network....that is why your clients hang....I think

you can handle in two ways soft (the magic recipe for corrupted data) and hard (the way to holiness)

try this
# device             mountpoint  fs-type    options    **** fsckord

your.share:/home  /backup   nfs         rw,hard,intr      0     0
you can't kill the process (kind of) if you don't specify intr.

Try with a test system and use iptables to simulate the disconnection of the NFS Server.

one question though....what happens when the share crashes?....meaning...why the share is down?...is it a network issue, NFS Server down or busy?....can you reach the NFS from this client via network? ICMP, SSH,....
I'm asking because there are ways to "trick" the client when there seems to be no connection on to the NFS server.

In this case the NFS Server was down. The storage sub system had the swine flue and managed to wipe the OS from the boot LUN (at least the LVM LUN was still there).

Cheers for those options .. gonna have a play with it

jibbajabba

Just read this :

Mounting an NFS Volume

intr really seems to be the way to go .. Cheers for the pointer

jibbajabba

Heh nice one senghor

Now added INTR as well ..

df obviously still hangs but does allow cancelling (CTRL-C) and even more importantly allows to unmount the share without crashing the whole console / ssh session ...

Cheers ..

senghor

Gomjaba wrote: »

Heh nice one senghor

Now added INTR as well ..

df obviously still hangs but does allow cancelling (CTRL-C) and even more importantly allows to unmount the share without crashing the whole console / ssh session ...

Cheers ..

Cool!

senghor

oh!.
did you give it a go with iptables?

jibbajabba

Nah, I just shutdown the interface with the VLan used solely for the backup share

senghor

Gomjaba wrote: »

Nah, I just shutdown the interface with the VLan used solely for the backup share

Nice

Forsaken_GA

Gomjaba wrote: »

That is what I do when all fails. But the box in question had also a couple of DRBD drives worth a few TB with an uptime of around 700 days. Which was already impressive but we knew it would force a filesystem check upon reboot which would have taken weeks probably. As I also didn't have access to the Internet l couldn't google how to disable the force fsck lol - basically everything which could have gone wrong - did

Ah, that would do it.

When I was dealing with stubborn NFS mounts, it usually went something like this -

Customer: 'Hey, none of our videos are playing, is something wrong with the server?'

(long story short, front end web servers mounting NFS shares to server up videos from a central location so that the content didn't have to be replicated among many servers)

Me: 'Hrm, it looks like the NFS mount is stale, I'll see what I can do'

At this point, I would have about 10 minutes to fix the problem before the customer is on the phone to the president of the company, who would then be riding my ass and reminding how important a customer this is, so on and so forth.

If I can't get the share restored in that time frame, it's time to emply shotgun diagnostics and reboot the sucker. Telling the customer that the server was hung and needed to be rebooted is something they understand - they're windows users, after all

jibbajabba

IPTables are for wussies lol

jibbajabba

Lol Forsaken, reboot is a cool solution, forced file system check of 20TB data drives isn't lol.

Can this actually be turned off on the fly ?

senghor

Gomjaba wrote: »

Can this actually be turned off on the fly ?

alright...

this is the definition in /etc/fstab

# device   mount_point     FS_type      options   ****_freq fsck_order

fsck_order can be set to:
0=ignore
1=first
2-9=second-third...
so...Mr Watson, how do you disable fsck of a certain volume before the reboot?

jibbajabba

Are you sure it is still a valid setting in the new kernels because the server in question has already the setting at 0 0 and run a fsck on boot ....

jibbajabba

[root@test ~]# tune2fs -l /dev/mapper/test 
<SNIP>
Filesystem created:       Wed Dec  2 16:23:05 2009
Last mount time:          Sat Apr 24 01:17:36 2010
Last write time:          Sat Apr 24 01:17:36 2010
Mount count:              3
Maximum mount count:      33
[B]Last checked:             Wed Dec  2 16:23:05 2009 
Check interval:           15552000 (6 months)
Next check after:         Mon May 31 17:23:05 2010[/B]
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
<SNIP>

[root@test ~]# tune2fs -i 0 /dev/mapper/test 
tune2fs 1.39 (29-May-2006)
Setting interval between checks to 0 seconds

[root@test ~]# tune2fs -l /dev/mapper/test 
<SNIP>
Filesystem created:       Wed Dec  2 16:23:05 2009
Last mount time:          Sat Apr 24 01:17:36 2010
Last write time:          Sun Apr 25 12:17:14 2010
Mount count:              3
Maximum mount count:      33
[B]Last checked:             Wed Dec  2 16:23:05 2009
Check interval:           0 (<none>)[/B]
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
<SNIP>

Guess that'll do ...

senghor

Gomjaba wrote: »

[root@test ~]# tune2fs -l /dev/mapper/test 
<SNIP>
Filesystem created:       Wed Dec  2 16:23:05 2009
Last mount time:          Sat Apr 24 01:17:36 2010
Last write time:          Sat Apr 24 01:17:36 2010
Mount count:              3
Maximum mount count:      33
[B]Last checked:             Wed Dec  2 16:23:05 2009 
Check interval:           15552000 (6 months)
Next check after:         Mon May 31 17:23:05 2010[/B]
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
<SNIP>

[root@test ~]# tune2fs -i 0 /dev/mapper/test 
tune2fs 1.39 (29-May-2006)
Setting interval between checks to 0 seconds

[root@test ~]# tune2fs -l /dev/mapper/test 
<SNIP>
Filesystem created:       Wed Dec  2 16:23:05 2009
Last mount time:          Sat Apr 24 01:17:36 2010
Last write time:          Sun Apr 25 12:17:14 2010
Mount count:              3
Maximum mount count:      33
[B]Last checked:             Wed Dec  2 16:23:05 2009
Check interval:           0 (<none>)[/B]
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
<SNIP>

Guess that'll do ...

Forsaken_GA

Gomjaba wrote: »

Lol Forsaken, reboot is a cool solution, forced file system check of 20TB data drives isn't lol.

No, I understand. The forced fsck on a very big filesystem would be a severely limiting factor in your case hehe

God, I don't even like rebooting servers with 5TB of storage, it takes reiser forever to mount them

senghor

One thing to add Gojamba.

even with an fsck set to 0, if the kernel detects a corruption...as minimal as it could be...it will set fsck to 1. so...it will happen!

just to let yo know