Kill the NFS CLIENT processes

jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
Which processes need to be killed to kill the NFS CLIENT (not server) ?

Some say NFS, some portmap, some nfslock but neither would release a frozen share.

In my case for example it was so bad that I couldn't even 'lazy' unmount the share as the session froze as soon as I typed the share name. On another server I was able to copy / paste the command which worked, but that server had a stupid java remote card which didn't allow copy / paste.

The only workaround here was removing the share from fstab and reboot the server but is there another way to kill the client ?!
My own knowledge base made public: http://open902.com :p

Comments

  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    Hi Gomjaba,

    how are you mounting the NFS? hard,intr I hope....

    You can't kill the client until you kill -9 the processes that are using (or trying to use) the share..
    you can see the processes with:
    fuser /your/share

    if it doesn't, there i a work around that I use....let me know.
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    senghor wrote: »
    Hi Gomjaba,

    how are you mounting the NFS? hard,intr I hope....

    You can't kill the client until you kill -9 the processes that are using (or trying to use) the share..
    you can see the processes with:
    fuser /your/share

    if it doesn't, there i a work around that I use....let me know.

    in fstab I got
    192.168.0.29:/shares/mike      /backup         nfs     defaults        0 0
    

    Which is then simply mounted with
    mount /backup
    
    [root@mike-1 ~]# df -h
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda1              65G  4.4G   57G   8% /
    /dev/sdb1             902G   39G  817G   5% /home
    tmpfs                 2.0G     0  2.0G   0% /dev/shm
    192.168.0.29:/shares/mike
                          197G   41G  157G  21% /backup
    

    fuser /backup would not show anything in this case as nothing is actually using the share until my backup script kicks in.

    What happens now is, if say the NAS is completely down, every process trying to access this share hangs.

    Whether it is 'df' or other. Even yum freezes completely as it will also check for disk space which again, you can't.

    When you try to 'umount -l' the share it starts to freeze the ssh session as soon as you type /backup so the only way is copy / paste which does work, not on a remote card though which can't copy / paste.

    Does this make sense ? So what I am trying to achieve is unmounting the share if I can't copy / paste the umount command.
    My own knowledge base made public: http://open902.com :p
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    I'll be honest, if I can't force unmount it, I usually just reboot the box in question to get the share freed up. More often than not, the process that's trying to use it simply will not kill -9 and it's faster for me to kick the box over and fix the fallout from doing so than to leave it in that state and keep trying to find ways to kill it
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    That is what I do when all fails. But the box in question had also a couple of DRBD drives worth a few TB with an uptime of around 700 days. Which was already impressive but we knew it would force a filesystem check upon reboot which would have taken weeks probably. As I also didn't have access to the Internet l couldn't google how to disable the force fsck lol - basically everything which could have gone wrong - did icon_smile.gif
    My own knowledge base made public: http://open902.com :p
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    Gomjaba,

    I see that in your fstab you are not handling failures of server/network....that is why your clients hang....I think :)

    you can handle in two ways soft (the magic recipe for corrupted data) and hard (the way to holiness) :D

    try this
    # device             mountpoint  fs-type    options    **** fsckord
    
    your.share:/home  /backup   nfs         rw,hard,intr      0     0
    
    

    you can't kill the process (kind of) if you don't specify intr.

    Try with a test system and use iptables to simulate the disconnection of the NFS Server.

    one question though....what happens when the share crashes?....meaning...why the share is down?...is it a network issue, NFS Server down or busy?....can you reach the NFS from this client via network? ICMP, SSH,....
    I'm asking because there are ways to "trick" the client when there seems to be no connection on to the NFS server.
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    senghor wrote: »
    Gomjaba,

    I see that in your fstab you are not handling failures of server/network....that is why your clients hang....I think :)

    you can handle in two ways soft (the magic recipe for corrupted data) and hard (the way to holiness) :D

    try this
    # device             mountpoint  fs-type    options    **** fsckord
    
    your.share:/home  /backup   nfs         rw,hard,intr      0     0
    
    

    you can't kill the process (kind of) if you don't specify intr.

    Try with a test system and use iptables to simulate the disconnection of the NFS Server.

    one question though....what happens when the share crashes?....meaning...why the share is down?...is it a network issue, NFS Server down or busy?....can you reach the NFS from this client via network? ICMP, SSH,....
    I'm asking because there are ways to "trick" the client when there seems to be no connection on to the NFS server.

    In this case the NFS Server was down. The storage sub system had the swine flue and managed to wipe the OS from the boot LUN (at least the LVM LUN was still there).

    Cheers for those options .. gonna have a play with it :)
    My own knowledge base made public: http://open902.com :p
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Just read this :

    Mounting an NFS Volume

    intr really seems to be the way to go .. Cheers for the pointer :)
    My own knowledge base made public: http://open902.com :p
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Heh nice one senghor

    Now added INTR as well ..

    df obviously still hangs but does allow cancelling (CTRL-C) and even more importantly allows to unmount the share without crashing the whole console / ssh session ...

    Cheers ..
    My own knowledge base made public: http://open902.com :p
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    Gomjaba wrote: »
    Heh nice one senghor

    Now added INTR as well ..

    df obviously still hangs but does allow cancelling (CTRL-C) and even more importantly allows to unmount the share without crashing the whole console / ssh session ...

    Cheers ..

    Cool! :D
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    oh!.
    did you give it a go with iptables?
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Nah, I just shutdown the interface with the VLan used solely for the backup share icon_wink.gif
    My own knowledge base made public: http://open902.com :p
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    Gomjaba wrote: »
    Nah, I just shutdown the interface with the VLan used solely for the backup share icon_wink.gif

    Nice icon_thumright.gif
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    Gomjaba wrote: »
    That is what I do when all fails. But the box in question had also a couple of DRBD drives worth a few TB with an uptime of around 700 days. Which was already impressive but we knew it would force a filesystem check upon reboot which would have taken weeks probably. As I also didn't have access to the Internet l couldn't google how to disable the force fsck lol - basically everything which could have gone wrong - did icon_smile.gif

    Ah, that would do it.

    When I was dealing with stubborn NFS mounts, it usually went something like this -

    Customer: 'Hey, none of our videos are playing, is something wrong with the server?'

    (long story short, front end web servers mounting NFS shares to server up videos from a central location so that the content didn't have to be replicated among many servers)

    Me: 'Hrm, it looks like the NFS mount is stale, I'll see what I can do'

    At this point, I would have about 10 minutes to fix the problem before the customer is on the phone to the president of the company, who would then be riding my ass and reminding how important a customer this is, so on and so forth.

    If I can't get the share restored in that time frame, it's time to emply shotgun diagnostics and reboot the sucker. Telling the customer that the server was hung and needed to be rebooted is something they understand - they're windows users, after all
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    IPTables are for wussies lol :p
    My own knowledge base made public: http://open902.com :p
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Lol Forsaken, reboot is a cool solution, forced file system check of 20TB data drives isn't lol.

    Can this actually be turned off on the fly ?
    My own knowledge base made public: http://open902.com :p
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    Gomjaba wrote: »

    Can this actually be turned off on the fly ?

    alright...

    this is the definition in /etc/fstab
    # device   mount_point     FS_type      options   ****_freq fsck_order
    

    fsck_order can be set to:
    0=ignore
    1=first
    2-9=second-third...

    so...Mr Watson, how do you disable fsck of a certain volume before the reboot? ;)
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Are you sure it is still a valid setting in the new kernels because the server in question has already the setting at 0 0 and run a fsck on boot ....
    My own knowledge base made public: http://open902.com :p
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Ah
    [root@test ~]# tune2fs -l /dev/mapper/test 
    <SNIP>
    Filesystem created:       Wed Dec  2 16:23:05 2009
    Last mount time:          Sat Apr 24 01:17:36 2010
    Last write time:          Sat Apr 24 01:17:36 2010
    Mount count:              3
    Maximum mount count:      33
    [B]Last checked:             Wed Dec  2 16:23:05 2009 
    Check interval:           15552000 (6 months)
    Next check after:         Mon May 31 17:23:05 2010[/B]
    Reserved blocks uid:      0 (user root)
    Reserved blocks gid:      0 (group root)
    <SNIP>
    
    [root@test ~]# tune2fs -i 0 /dev/mapper/test 
    tune2fs 1.39 (29-May-2006)
    Setting interval between checks to 0 seconds
    
    [root@test ~]# tune2fs -l /dev/mapper/test 
    <SNIP>
    Filesystem created:       Wed Dec  2 16:23:05 2009
    Last mount time:          Sat Apr 24 01:17:36 2010
    Last write time:          Sun Apr 25 12:17:14 2010
    Mount count:              3
    Maximum mount count:      33
    [B]Last checked:             Wed Dec  2 16:23:05 2009
    Check interval:           0 (<none>)[/B]
    Reserved blocks uid:      0 (user root)
    Reserved blocks gid:      0 (group root)
    <SNIP>
    

    Guess that'll do ...
    My own knowledge base made public: http://open902.com :p
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    Gomjaba wrote: »
    Ah
    [root@test ~]# tune2fs -l /dev/mapper/test 
    <SNIP>
    Filesystem created:       Wed Dec  2 16:23:05 2009
    Last mount time:          Sat Apr 24 01:17:36 2010
    Last write time:          Sat Apr 24 01:17:36 2010
    Mount count:              3
    Maximum mount count:      33
    [B]Last checked:             Wed Dec  2 16:23:05 2009 
    Check interval:           15552000 (6 months)
    Next check after:         Mon May 31 17:23:05 2010[/B]
    Reserved blocks uid:      0 (user root)
    Reserved blocks gid:      0 (group root)
    <SNIP>
    
    [root@test ~]# tune2fs -i 0 /dev/mapper/test 
    tune2fs 1.39 (29-May-2006)
    Setting interval between checks to 0 seconds
    
    [root@test ~]# tune2fs -l /dev/mapper/test 
    <SNIP>
    Filesystem created:       Wed Dec  2 16:23:05 2009
    Last mount time:          Sat Apr 24 01:17:36 2010
    Last write time:          Sun Apr 25 12:17:14 2010
    Mount count:              3
    Maximum mount count:      33
    [B]Last checked:             Wed Dec  2 16:23:05 2009
    Check interval:           0 (<none>)[/B]
    Reserved blocks uid:      0 (user root)
    Reserved blocks gid:      0 (group root)
    <SNIP>
    

    Guess that'll do ...

    icon_thumright.gif
  • Forsaken_GAForsaken_GA Member Posts: 4,024
    Gomjaba wrote: »
    Lol Forsaken, reboot is a cool solution, forced file system check of 20TB data drives isn't lol.

    No, I understand. The forced fsck on a very big filesystem would be a severely limiting factor in your case hehe

    God, I don't even like rebooting servers with 5TB of storage, it takes reiser forever to mount them
  • senghorsenghor Member Posts: 38 ■■□□□□□□□□
    One thing to add Gojamba.

    even with an fsck set to 0, if the kernel detects a corruption...as minimal as it could be...it will set fsck to 1. so...it will happen!

    just to let yo knowicon_wink.gif
Sign In or Register to comment.