Oh boy, Linux server went bang

jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
I run a daily Acronis task on a Linux server but it complained that /usr is not readable / cannot read from source etc.

I run fsck and it asked me to reboot .. now this :

2wn4nt5.jpg

I know basic stuff about Linux - but it stops right there icon_sad.gif

Can someone give me a hint what would be the next step (if there is one) to hopefully get this one up again ?
My own knowledge base made public: http://open902.com :p

Comments

  • rossonieri#1rossonieri#1 Member Posts: 799 ■■■□□□□□□□
    hi,

    i think you have a corrupted system over there.
    perhaps that missing libidl.so was deleted by unlinked inode 2523...

    try to repair the system using emergency boot disk.
    what distro btw?
    the More I know, that is more and More I dont know.
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    RHEL 5.2

    I made the stupid mistake to run fsck while the system was running .. Didn't catch the "do not run when filesystems are mounted" - but its a learning curve - so its all good - not a crucial server anyway and I have a backup of the data anyway .. still want to fix as learning excercise .. Gonna try the emergency boot disk later on and "call back" if I am stuck :p
    My own knowledge base made public: http://open902.com :p
  • UnixGuyUnixGuy Mod Posts: 4,570 Mod
    I don't know how this works, but if it was solaris, you have to boot from cdrom(emergency disk?) and fsck your file system
    Certs: GSTRT, GPEN, GCFA, CISM, CRISC, RHCE

    Learn GRC! GRC Mastery : https://grcmastery.com 

  • UnixGuyUnixGuy Mod Posts: 4,570 Mod
    or you can do fsck in single user mode...because you can't unmount /usr or /var/ or /root, thats why you have to fsck the raw disk while the kernel is booted from cdrom (i.e /root is unmounted)
    Certs: GSTRT, GPEN, GCFA, CISM, CRISC, RHCE

    Learn GRC! GRC Mastery : https://grcmastery.com 

  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    oops - something you don't want to see when you start a server with the rescue disk

    3522q8k.jpg
    My own knowledge base made public: http://open902.com :p
  • UnixGuyUnixGuy Mod Posts: 4,570 Mod
    sorry man, can't help in Linux
    Certs: GSTRT, GPEN, GCFA, CISM, CRISC, RHCE

    Learn GRC! GRC Mastery : https://grcmastery.com 

  • tierstentiersten Member Posts: 4,505
    You're not supposed to run fsck on a mounted RW FS but that isn't what caused your problems. You already had issues before doing that and they were major ones.

    With the limited information and assuming it hadn't crashed before or somebody hasn't done anything to it, I'd hazard a guess and say your drive or controller is failing.
  • rossonieri#1rossonieri#1 Member Posts: 799 ■■■□□□□□□□
    hi gomjaba,

    relax - stay cool :) that screen was only a warning, not a big deal,
    just hit ENTER and see whether it actually gave you the correct information.
    enter the shell - and try fdisk /dev/<whatever_disk_you_have_there>

    you do know how to use fdisk right?
    just print the partition information of the corrupted disk - and if that warning sign is correct that you are no longer having any partition - i hate to say that you lost it, there is no way AFAIK to recover a lost partition eventhough there are news that other 3rd party tool able to do that.

    but - if you still can get the partition printed out - than there is a chance to copy the libdl.so from another machine and try to fix it using fsck.


    @ unixguy :)
    sorry man, can't help in Linux

    no offense, but come on - you can do better than that,
    even in solaris - you'd still have to use fdisk right?

    cheers!!!
    the More I know, that is more and More I dont know.
  • darkerosxxdarkerosxx Banned Posts: 1,343
    You noticed it said it couldn't find your fstab file, right? I would say boot into rescue mode and recreate your fstab file using your backup or with what you know it should be. Without that, you won't have any mount points, so you won't have any partitions.
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Well thanks guys, but even fdisk shows one big empty disk

    Guess it is reinstalling time ...
    My own knowledge base made public: http://open902.com :p
  • UnixGuyUnixGuy Mod Posts: 4,570 Mod
    lol..I look miserable tho, calling my self UNIX guy and unable to help..I should change my nickname icon_lol.gif


    no you don't use fdisk in Solaris when it's on SPARC (which is like 90% of the time anyway), you use fdisk in the unlikely situation of running Solaris on an X68 architecture.

    I didn't want to suggest a solution because I really dont know how this "rescue" disk work so I don't know why this message appeared, his production machine is not for R&D I guess. and I don't know how the initial problem happened.

    yes I don't have linux experience yet :)
    hi gomjaba,

    relax - stay cool :) that screen was only a warning, not a big deal,
    just hit ENTER and see whether it actually gave you the correct information.
    enter the shell - and try fdisk /dev/<whatever_disk_you_have_there>

    you do know how to use fdisk right?
    just print the partition information of the corrupted disk - and if that warning sign is correct that you are no longer having any partition - i hate to say that you lost it, there is no way AFAIK to recover a lost partition eventhough there are news that other 3rd party tool able to do that.

    but - if you still can get the partition printed out - than there is a chance to copy the libdl.so from another machine and try to fix it using fsck.


    @ unixguy :)


    no offense, but come on - you can do better than that,
    even in solaris - you'd still have to use fdisk right?

    cheers!!!
    Certs: GSTRT, GPEN, GCFA, CISM, CRISC, RHCE

    Learn GRC! GRC Mastery : https://grcmastery.com 

  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Well ... 4am here and server rebuilt lol ...
    My own knowledge base made public: http://open902.com :p
  • UnixGuyUnixGuy Mod Posts: 4,570 Mod
    oh God, good job !

    how did this problem happen ?
    Certs: GSTRT, GPEN, GCFA, CISM, CRISC, RHCE

    Learn GRC! GRC Mastery : https://grcmastery.com 

  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    UnixGuy wrote: »
    oh God, good job !

    how did this problem happen ?

    Pure stupidity. We have a few server (Linux) running Acronis and recently they all stopped working with a Read Error. So we THOUGHT that the FS is bust for some reason (started of with just one server).

    So I thought - fsck - that'll do .. However, I am a n00b when it comes to stuff like that. BUT before I tried that on that particual live system from a customer - I tried my own server which hosts a few forums. I knew if it does go bang - there is no "harm" apart from whinging member where nobody pays for the server anyway (apart from my boss lol).

    So I run fsck while the system was running, ignoring all the warnings that you shouldn't do that on mounted systems and got slapped with a stick :):)

    Yes it was stupid - but the good thing is : I will probably NEVER EVER do that again on a live system - that is for sure :)
    My own knowledge base made public: http://open902.com :p
  • UnixGuyUnixGuy Mod Posts: 4,570 Mod
    lool so you had nice time last night icon_lol.gif

    but did you know whats the cause of the Read error ?
    Certs: GSTRT, GPEN, GCFA, CISM, CRISC, RHCE

    Learn GRC! GRC Mastery : https://grcmastery.com 

  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    UnixGuy wrote: »
    lool so you had nice time last night icon_lol.gif

    but did you know whats the cause of the Read error ?

    Nope, we have a ticket open with Acronis as it is clearly not the FS. We tested several other systems "the right" way and they returned all green .. So problem is def. Acronis ..
    My own knowledge base made public: http://open902.com :p
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    Gomjaba wrote: »
    oops - something you don't want to see when you start a server with the rescue disk

    3522q8k.jpg

    LOL - I JUST realize that I was REALLY stupid / retarded ....

    NO WONDER it didn't find an installation - at that point the server was still running CentOS and not RHEL but I used the RHEL DVD DOOOOH :D
    My own knowledge base made public: http://open902.com :p
  • tierstentiersten Member Posts: 4,505
    Gomjaba wrote: »
    LOL - I JUST realize that I was REALLY stupid / retarded ....

    NO WONDER it didn't find an installation - at that point the server was still running CentOS and not RHEL but I used the RHEL DVD DOOOOH :D
    CentOS = RHEL without support and from a third party.

    Any recent Linux distribution will have detected it as a valid partition. It might not have detected it as the same distribution but it will know it is a Linux partition.
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    tiersten wrote: »
    CentOS = RHEL without support and from a third party.

    Any recent Linux distribution will have detected it as a valid partition. It might not have detected it as the same distribution but it will know it is a Linux partition.

    Good point ...
    My own knowledge base made public: http://open902.com :p
Sign In or Register to comment.