URGENT - inaccessible datastore
Essendon
Member Posts: 4,546 ■■■■■■■■■■
I have a 3 node cluster located at a remote site. Last evening, all of a sudden, one vol changed its partition type from VMFS to Win95 FAT32. You'd think this was signatured by a Windows server, but this environment has been untouched. This is out of the blue. The VM's are still running and are accessible. But I cannot browse the datastore, no files show up in there.
Check out the following screenshot:
A colleague's logged a call with VMware and they want to muck around with the partition table. The VM's cannot be shutdown without an outage. I dont have a backup of the VM's (dont ask me why, I got handed this site), in case VMware declare the table corrupted, which will probably mean a loss of the VM's. The partition was GPT, the other partitions are GPT too.
Hurriedly written post guys, so bear with me. What can I do to troubleshoot, lemme know if you need more info to be able to help me out.
Environment: ESXi 5, HP iSCSI storage. 2 hosts at first, the 3rd host was added a few days ago and no problems were detected then.
Check out the following screenshot:
A colleague's logged a call with VMware and they want to muck around with the partition table. The VM's cannot be shutdown without an outage. I dont have a backup of the VM's (dont ask me why, I got handed this site), in case VMware declare the table corrupted, which will probably mean a loss of the VM's. The partition was GPT, the other partitions are GPT too.
Hurriedly written post guys, so bear with me. What can I do to troubleshoot, lemme know if you need more info to be able to help me out.
Environment: ESXi 5, HP iSCSI storage. 2 hosts at first, the 3rd host was added a few days ago and no problems were detected then.
Comments
-
Essendon Member Posts: 4,546 ■■■■■■■■■■I cannot vMotion/SvMotion the VM's off the datastore. Just fails with the following error:
- Failed to initialize migration at source. Error 195887111. Bad parameter
- Timed out waiting for migration data
REmoving and adding datastore didnt help either. -
sratakhin Member Posts: 8184GB?!
I don't know what to suggest, but I'd would do Windows Server Backups (or use any other agent-based backups) just in case... -
sratakhin Member Posts: 818By the way, can you SSH to the host and try to copy files off the datastore, if it's still accessible in the console?
-
JBrown Member Posts: 308How large are the VMs suppose to be ? Would Vmware Converter work in this case, meaning can you give converting/moving to another node /datastore a try ? or how about cloning the vmkdks wth vmkfstools to another datastore, and then recreating the vmx file from scratch ?
VMware KB: Cloning and converting virtual machine disks with vmkfstools
I have not tested the vmkfstools on live VM, so no promises there. -
kj0 Member Posts: 767Was playing with VMKFSTools last night, This should hopefully work.
First things I thought of trying were SSH in and then VMKFSTools -
Essendon Member Posts: 4,546 ■■■■■■■■■■Done the following already gents
- SSH to the host, cant open the volume
- vmkfstools doesnt work either, the volume isnt VMFS any more! And then there are live VM's on there, there's an active file server.
Converter is an option, looking into that. -
kj0 Member Posts: 767Right Click > Delete from Disk = Problem solved.
:P
Seriously though, How'd did this happen? Are there logs saying what changed it? -
Essendon Member Posts: 4,546 ■■■■■■■■■■Yeah narrowed it down a little, this cluster had a new expansion unit put in. This LUN's on that unit.
-
jibbajabba Member Posts: 4,317 ■■■■■■■■□□If you cant even touch the VMs and given my recent support experience I'd try to save the VMs itself without shutting them down. There is a chance they won't even boot back up.
Install VMware converter inside the VM or download a trial from Acronis and recreate them on another datastore. Something tells me this isn't going to end well otherwise.My own knowledge base made public: http://open902.com -
dave330i Member Posts: 2,091 ■■■■■■■■■■Are you sure someone didn't accidently reformat that LUN? Had that happen to me when Win admin was trying to add another disk to a fileshare.
I wish you luck in trying to save them. Odds are, all your effort will probably fail. Like jibba said, don't power these VMs off. They won't restart.
If you can still access the guest OS, I would build replacement VMs and transfer as much as possible.2018 Certification Goals: Maybe VMware Sales Cert
"Simplify, then add lightness" -Colin Chapman -
Essendon Member Posts: 4,546 ■■■■■■■■■■Replacement VM's built (and working), there were 3 that the Converter couldnt work with, errored out saying it needed VM's with GPT partitions be powered-down. VMware concluded that a Windows server signatured that LUN somehow, trying to determine who/what did that. They seem to have washed their hands off it because those 3 remaining 3 VM's cannot be powered down and they need them turned off to be able to stuff around with the partition table.
It also turned out the problem started when the 3rd host was added to the cluster. I imagine someone installed Windows first and then ESXi. They may have also tried to add a LUN somewhere in between or after. Thanks for the suggestions guys, no VM's were powered off for fear of losing them forever.
Moral of the story - Backup your freaking VM's!! -
dave330i Member Posts: 2,091 ■■■■■■■■■■Good to hear that you were able to save some of your VMs.2018 Certification Goals: Maybe VMware Sales Cert
"Simplify, then add lightness" -Colin Chapman -
JBrown Member Posts: 308I will go ahead and guess that those 3 VMs contain Dynamic disks in them. i a similar issue, with Converter refusing to convert a 2 TB VM servicing as file share server; a consultant created a few VMs and set vmdks up as Dynamic disks; dont ask why, that was long before I joined the team.
Btw, what would happen if you add a storage to that data store, and try extend it with Vmware, will it kill the current Fat32 resignatured storage, or resig it and bring back the "hidden" treasures/VMs? I would love to test this scenenario out, but i am out of country at the moment. Please let us know if test it in your lab.
PS: I am glad you were able to recover at least some of your VMs.Replacement VM's built (and working), there were 3 that the Converter couldnt work with, errored out saying it needed VM's with GPT partitions be powered-down. VMware concluded that a Windows server signatured that LUN somehow, trying to determine who/what did that. They seem to have washed their hands off it because those 3 remaining 3 VM's cannot be powered down and they need them turned off to be able to stuff around with the partition table.
It also turned out the problem started when the 3rd host was added to the cluster. I imagine someone installed Windows first and then ESXi. They may have also tried to add a LUN somewhere in between or after. Thanks for the suggestions guys, no VM's were powered off for fear of losing them forever.
Moral of the story - Backup your freaking VM's!!